What Language Models Don't Do For Us
The only thing larger than language models is my disappointment.
Large Language Models (LLMs) are a suitable contender, in my view, for the most fascinating scientific artefact of this century so far. This is not an original view. This was not, moreover, by design: my understanding is that if there was any application associated the early days of experimentation with the transformer architecture, it was translation. Most of what has happened in the intervening period is quite apart from these original aims.
As the commercial world deals with the still-uncertain impacts of LLMs, areas of cognitive science and philosophy have become flooded with papers arguing over the implications of a linguistically fluent and coherent machine for theories of learnability, language acquisition, language processing, cognitive architecture, meaning, agency, free will, and the necessary and/or sufficient behavioral criteria and architectural underpinnings of consciousness. I was particularly struck to see Raphael Millière, co-author with Cameron Buckner of this recently published survey on the philosophy of language models, refer to this form of study as a “field” in promoting the article.
I cannot speak for the full gamut of issues - familiar readers will know that my tiny contribution to these debates concerns foundational questions in the philosophy of cognitive science and questions about the theoretical significance of computational modeling to human language acquisition; I do not have much to independently argue in the way of, say, consciousness.
But, it seems, neither does anyone else. The significance of LLMs to these questions is often misconceived, in my view, as a way of raising new, “big” questions about the nature of mind and cognition. They must instead be understood as motivations for asking these questions, as they had already been established for many years prior to the development of LLMs, if contentious.
So rich was the academic output from the past several centuries (and further, for certain questions1) that the most LLMs can hope to do is give the “big” problems a feel they otherwise lacked, perhaps contrasting with the hypothetical examples that dominated philosophical arguments previously.2 Indeed, Millière and Buckner in their aforementioned piece begin their lengthy essay with a reference to Descartes’ exceptionally brief remarks on language - “Almost four centuries later, [LLMs] appear to challenge Descartes’ claim” - while proceeding to neglect any subsequent engagement with the Cartesian view and its descendants, a mainstay of recent papers that I lamented here in November.
I have, in any event, felt increasingly disappointed by the academic output on a sampling of these questions. My disappointment stems from (what I see as) four trends that have intensified, rather than waned, as one might have expected. These trends often begin in the commercial world and slither into ostensible sober-minded analysis:
(1) Large Language Models have largely provided fodder for the views - or the underlying logic of certain views - previously held by authors.
Part of this owes to what I just mentioned; language models do not themselves raise the “big” questions about mind and cognition. They motivate them in new ways. As a result, their use has been to intervene on debates that had been underway for years, if not centuries.
As an example close to home, generative and computational linguists are talking past one another in their respective dealings on the significance of LLMs as much as they had before LLMs - to some extent, knowingly, but as I have argued, LLMs themselves cannot bridge the gaps between these approaches to language.
More specifically, I struggle to find the link between many computational linguists’ insistence that LLMs acquire language (itself contestable, dependent on one’s conception of language, itself frequently ignored) and, therefore, LLMs are instructive of human language acquisition. Even if we grant the premise, the conclusion seems questionable. An intervening argument about why a computational model that replicates certain behavioral outputs is tantamount to an explanation for human language acquisition is required to allow the conclusion to reasonably flow from the premise. I have questioned this reasoning at length elsewhere, identifying it as a problem in theory-formation going well beyond the topical example of LLMs (though, again, motivated by them). And, in the interest of fairness, I suspect some generative linguists now recognize that their previous comments - I am specifically thinking of the assumptions behind such comments - on the viability of computational modeling for linguistic theory and its implications for foundational matters like Universal Grammar were deficient.3 (I might draw attention to various contributions on this matter, however, ahem ahem.)
Cynicism is not an argument, speaking of, and one must be cautious about imposing motivations on other they do not hold. However, I will trust that the reader some familiarity with the bizarre dynamic between Noam Chomsky and his critics in the field, and not find me too disagreeable when I suggest LLMs are being used in furtherance of this dynamic at least as much as they are in furtherance of interesting ideas.
(2) Debates about Large Language Models increasingly operate at a “meta” level, in which participants hold dramatically different mental images of them.
At some point, enough people were able to crystallize their mental images of what an LLM “is” and what kinds of capabilities it provides, such that we could say their minds were made up. Although the pro vs. anti camps do get you a solid way towards understanding someone’s views, people’s mental images about LLMs differ substantially. (Contrary to what some on the “pro” side might argue, I do not believe this is for lack of interaction with LLMs, necessarily. People think differently.)
Reading academic output on LLMs, one senses that scientists/researchers and philosophers alike - though, perhaps more the former - are constructing experiments on LLMs with assumptions that derive more from their mental images of LLMs than any clear connection to explanatory theory, or anchors in engagement with those “big” questions from years past. This has knock-on effects, too, in which “rates of progress” are notions that live in one’s mind, not grounded in some shared and publicly specified framework, which helps account for the shock some express over the lackadaisical attitudes of others about LLMs-this or LLM-that.
(3) Many of the debates about LLMs are beginning to feel forced.
Over the past few days, Anthropic released a Mythos-class model dubbed “Fable,” complete with a policy indicating that they would covertly nerf the model’s capabilities (ostensibly to prevent industry competitors from leveraging it for their own purposes), only to be walked back on June 11th after an uproar.
I understand the concerns people have on this policy and Anthropic’s continuing ability to implement such a policy without users’ awareness (nor consent), and these grievances were naturally linked to CEO Dario Amodei’s “Policy on the AI Exponential” blog post. Nevertheless, the previous two trends interacted with the reaction to this, as one’s rage depends in large part on whether one considers LLMs sufficiently capable (however conceived) and whether one regularly uses Anthropic’s products in their workflows.
That said - and, really, those are perfectly legitimate concerns - much of this is beginning to feel forced.
The constant need to run every every professional decision or interesting question to which philosophers and scientists have thrown themselves over the years through the lens of an LLM’s capabilities is exhausting. Recall how I started this piece. I share the interest. Surely, however, if “AI fatigue” means anything, it means that there is too much of this. That it is interesting does not mean I need more of it.
The gripe that led me to write this piece is this: otherwise intelligent and perhaps well-meaning people are using the possibility that LLMs will cause social and economic transformations like mass unemployment as a means of sustaining personas they have crafted around them. I will be sufficiently vague here, but you have doubtless come across this on your own; the person who seems too eager to tell you about our impending obsolescence, or too adamant about whether an LLM has X, Y, or Z capability. All in the name of safety and genuine concern, though the concern has a peculiarly aggressive vibe. None of this works without the possibility of everything changing hanging over the heads of those in the discourse. Otherwise, the rest of us might move on.
This is disappointing to me, in a way I have some difficulty expressing. It might help to note that grifters are only one side of this, and not the side I am thinking of here.
(4) Separating industry from science works in principle, but not in practice.
That brings us to the final trend. You will often hear the following: yes, the AI industry is grotesque, but we have to separate those who own and develop LLMs from the technology itself as we try to tease out their implications for science and philosophy.
This is correct in principle. In practice, it is nearly impossible to do with rigor and consistency. The “forced” nature of many debates about LLMs is dependent on the hype. More specifically, to ensure that one entertains the views vis-a-vis LLMs of those who might otherwise have difficulty gaining public traction, LLMs must be understood as “inevitable” or “too capable to be in denial about.”
Those of us involved in these discussions have become accustomed to viewing these as separate matters: the commercial dynamics, on the one hand, and the scientific dynamics, on the other. But, again, this is only true in principle. In practice, LLMs have subsumed collective attention in ways that I struggle to see as justified, despite how I began this piece. The reason for this traces back to the cloud of doubt hovering over the heads of those involved in these various conversations.
Academia is not affected by this in quite the same way as the commercial world. But it is affected by it, which is why the fields I mentioned like cognitive science and philosophy of mind have to now attend to LLMs. And it is possible that I am part of the problem, as I have written….quite a bit on LLMs and cognitive science. Again, recall the start of this piece.
In any event, LLMs do not solve our problems. Even if this technology spurred a successful transition to a post-scarcity world where everyone’s material needs are met,4 our problems would still be with us.
That’s all. Anything else would just be rambling.
Chomsky didn’t call it “Plato’s problem” for nothing.
This is no small thing, of course. Though, I think if philosophers do anything well, it’s constructing hypothetical examples that better serve the aim of answering one of the “big” questions. Sometimes idealization is the quickest way to better understanding.
As the linked, forthcoming piece argues, I believe the “existence proof” arguments for LLMs rests on an assumption that LLMs “learn” language through exposure to data, and that this is comparable to a human infant’s learning. I argue this is false, and that a child’s ability to acquire language remains something of a mystery, not aided by the alleged existence proof. The deficiency of the generative linguist is in failing to articulate the scope of computational modeling’s theoretical usefulness, rather than taking the (also deficient) stance of a variant of usage-based linguistics.

