News
Why Google’s AI Overviews Can’t Spell Google, Trump or Strawberry
Google’s AI Overviews keep returning the wrong number of letters for common words like Google, poop, and journalism, and the wrong spelling of the U.S. president’s last name. The cause traces back to the language model under the AI Overview layer, which never reads letters at all. It reads tokens, numerical fragments of text that every major AI model is built on.
Google told TechCrunch on Wednesday that “counting within words has been a known challenge for LLMs, and we’re working to fix this particular issue.” Researchers who study that exact challenge say there is no perfect fix. And the system producing the answers now appears on roughly one in four U.S. Google search queries.
The Words Google Couldn’t Spell
Screenshots of Google’s AI Overview misspelling common words have piled up across the past week. TechCrunch tested the system and got back the wrong count for the letter P in Google, the wrong number of R’s in poop, an extra D inside journalism, and a scrambled version of Trump. The errors landed days after a separate flaw broke the system’s dictionary feature entirely.
Most of the misspellings share a pattern. They swap a letter inside the word rather than at its edge, and they fall apart most when the word contains repeats.
| Word queried | What the AI Overview returned | Correct answer |
|---|---|---|
| Two P’s | One P | |
| poop | One R | Zero R’s |
| journalism | j-o-u-r-n-a-d-i-s-m | j-o-u-r-n-a-l-i-s-m |
| Trump | t-r-p-u-m | t-r-u-m-p |
The journalism answer is the cleanest illustration of what is going wrong. Google’s AI claimed the word contains two D’s, then printed a version with one D substituted for the L. Both the count and the spelling were wrong, and they were wrong in different directions inside the same response.
Why Transformers Can’t See Letters
AI language models do not read text the way a child learning to spell does. Before any computation happens, the input passes through a tokenizer, a program that chops words into recurring fragments and converts each fragment into a number.
From Letters to Numbers
Most production models, including the systems powering Google’s AI Overviews, use a method called byte-pair encoding. The technique builds its vocabulary by scanning huge volumes of training text, merging the most common letter pairs into tokens, then merging the most common pairs of tokens into longer fragments, and so on. The word “strawberry” becomes the three tokens “st”, “raw” and “berry” in OpenAI’s GPT-4o tokenizer. The model never sees the eleven characters. It sees three numbers.
Matthew Guzdial, an assistant professor of computing science at the University of Alberta, summarized the gap for TechCrunch.
LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding. When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.’
What Tokens Throw Away
That trade-off is deliberate. Treating every character as its own input would produce sequences four to five times longer, which would multiply the cost of training and running each query. The compromise lets a model fit longer passages into the same compute budget. The price it pays is the inability to reliably introspect its own input at the letter level. Ask the model to count R’s in strawberry and it must reconstruct, from the meanings of “st”, “raw” and “berry”, a property the architecture removed at step one.
The Strawberry Problem Has a Long Tail
The pattern has been documented long enough to have a nickname. AI researchers call it the strawberry problem, after a question that became a running joke: ask a model how many R’s are in strawberry and the most common answer is two, not three. The error survived multiple major model releases through 2024 and 2025, and it now sits inside the system Google serves to its largest audience.
What the Counting Study Found
A December 2024 study on letter counting in language models tested thirteen popular LLMs and found that most failed on words where a letter appeared more than twice. The strongest correlation was not with how often the word appeared in training data, the authors wrote, but with the complexity of the counting operation itself. Models could recognize each letter in isolation. They could not reliably tally repeats inside a single token.
Sheridan Feucht, a PhD student at Northeastern University who studies how large language models build internal representations, was not optimistic about a clean solution. “It’s kind of hard to get around the question of what exactly a ‘word’ should be for a language model,” she told TechCrunch, adding that even if researchers agreed on “a perfect token vocabulary, models would probably still find it useful to chunk things even further.” Her conclusion: “My guess would be that there’s no such thing as a perfect tokenizer due to this kind of fuzziness.”
The Token-Free Alternatives
Alternative architectures exist on the research bench. MambaByte, a token-free state space model published in 2024, processes raw bytes without any tokenizer at all and matches the performance of subword transformers on several benchmarks while staying more robust to corrupted input. Character-level models like CANINE and Charformer have shown similar strengths. None of them powers a consumer search engine that serves billions of queries a day. The cost and latency advantages of subword tokenization remain too large to give up at production scale, which means the bug stays embedded in the foundation.
Two Billion Users, One Architectural Limit
The errors would matter less if AI Overviews were a niche feature. They are not. Google’s AI-generated answer boxes now reach roughly two billion users a month, surface on more than a quarter of U.S. queries tracked by BrightEdge, and climb past 50% prevalence on long-tail questions seven words or longer.
The same tracking shows AI Overviews are doing what Google promised they would. They are keeping users on the search page. When an Overview is present, the click-through rate to the top organic result falls sharply, which means the AI-generated text is increasingly the only answer a user reads before moving on. Google has continued to position the feature as the centerpiece of its product roadmap, including in the company’s Search updates at I/O 2026.
- 2 billion monthly users now see an AI Overview, Google’s biggest deployment of generative AI by audience reach.
- 25.8% of tracked U.S. searches return an AI Overview, based on a study of 2.37 million queries.
- 57% of long-tail queries trigger an AI Overview response.
- 65% drop in organic click-through when an AI Overview is present at the top of the page.
A misspelled answer at this volume is no longer a quirky screenshot. It is the answer most users will accept and move on with.
When Disregard Becomes a Command
The spelling problem is one symptom of a broader category of failure where AI Overviews treat input the wrong way. Separate reports last week confirmed that searching the word “disregard” returned not a dictionary card but the line “Understood. Let me know whenever you have a new prompt or question!” Single-word queries like ignore, dismiss, and skip produced the same behavior. The AI was reading the search bar as a chat box and obeying the words it found there.
Google acknowledged the bug on May 23. “We’re aware that AI Overviews are misinterpreting some action-related queries, and we’re working on a fix, which will roll out soon,” a company spokesperson said. The dictionary issue was patched within days.
Both failures share a root. The AI Overview layer interprets every search through natural language processing, then generates free-form text on top of whatever the deterministic search index returned. When the input contains a command-shaped word or a question about letters, the language model in the loop responds with what it was trained to do, which is talk, not look up.
That structural choice produced AI Overviews’ biggest wins, faster answers for messy, conversational queries that classic blue-link search handled badly. It also produced the spelling errors, the dictionary glitch, and the 2024 launch incidents when the same system told users to put glue on pizza and eat rocks. The successes and the failures come from the same code path.
Frequently Asked Questions
Why does Google’s AI Overview misspell words like Google and Trump?
The model behind AI Overviews does not process individual letters. It processes tokens, which are short numeric fragments representing common letter groups, so questions about spelling or letter counts force the model to reconstruct character data it never had direct access to. That reconstruction often fails on words with repeated letters or unusual letter sequences.
What is tokenization, and why does it cause this?
Tokenization is the step that turns text into numbers before a language model sees it. The most common method, byte-pair encoding, splits words into recurring sub-word pieces like “st”, “raw” and “berry” for strawberry. The model learns the meaning of each piece but loses direct access to the individual characters inside it, which is why counting and spelling tasks are unreliable.
Will Google fix the spelling problem?
Google has said it is working on the specific issue and patched the related “disregard” bug within days. The underlying limitation is harder. Researchers including Sheridan Feucht at Northeastern University have said there is likely no perfect tokenizer that removes the trade-off, so the surface symptoms can be reduced but the structural problem will keep producing edge cases.
How can I turn off AI Overviews in Google Search?
Yes. Adding the modifier “-ai” to a query, or appending an apostrophe and a unique word, has been reported to suppress the AI Overview box on most searches. Switching to Google Search’s Web view from the tools menu also removes the Overview layer and shows only the traditional ranked links.
Do other AI chatbots have the same problem?
Yes. OpenAI’s GPT-4o, Anthropic’s Claude, and Meta’s Llama models all use sub-word tokenizers, and all of them have been documented failing the same letter-counting and spelling tests at various points. Some chatbots now route counting questions to a code-execution tool that operates on raw characters, which sidesteps the limit rather than fixing it.
How often do AI Overviews appear in Google Search now?
AI Overviews appear in roughly one in four tracked U.S. searches, and in more than half of queries that are seven words or longer, according to BrightEdge and other third-party trackers. Google has said the feature reaches around two billion users a month, which makes it the largest consumer deployment of generative AI by audience.
-
TECHNOLOGY3 years agoHow to Adjust a Bulova Watch Band – An Easy Guide
-
FINANCE3 years agoTax Planning for Every Season: Guide to Maximizing Your Tax Benefits
-
Education3 years agoAfrican Ministers New Education Plan
-
News3 years agoFred Pentland: Athletic Bilbao’s English mentor who changed the essence of Spanish football
-
BUSINESS3 years agoWhat is Entrepreneurial Operating System? A Comprehensive Guide to EOS
-
Education3 years agoInnovate Your Learning Journey with Technology and Enhance Education
-
News3 years agoRussians formally out of World Athletics Championships
-
BUSINESS3 years agoTop 9 Most Expensive American Cities to Rent an Apartment
