News

Why Google’s AI Overviews Can’t Spell Google, Trump or Strawberry

Published

on

<p>Google&&num;8217&semi;s AI Overviews keep returning the wrong number of letters for common words like Google&comma; poop&comma; and journalism&comma; and the wrong spelling of the U&period;S&period; president&&num;8217&semi;s last name&period; The cause traces back to the language model under the AI Overview layer&comma; which never reads letters at all&period; It reads tokens&comma; numerical fragments of text that every major AI model is built on&period;<&sol;p>&NewLine;<p>Google told TechCrunch on Wednesday that &&num;8220&semi;counting within words has been a known challenge for LLMs&comma; and we&&num;8217&semi;re working to fix this particular issue&period;&&num;8221&semi; Researchers who study that exact challenge say there is no perfect fix&period; And the system producing the answers now appears on roughly one in four U&period;S&period; Google search queries&period;<&sol;p>&NewLine;<h2>The Words Google Couldn&&num;8217&semi;t Spell<&sol;h2>&NewLine;<p>Screenshots of Google&&num;8217&semi;s AI Overview misspelling common words have piled up across the past week&period; TechCrunch tested the system and got back the wrong count for the letter P in Google&comma; the wrong number of R&&num;8217&semi;s in poop&comma; an extra D inside journalism&comma; and a scrambled version of Trump&period; The errors landed days after a separate flaw broke the system&&num;8217&semi;s dictionary feature entirely&period;<&sol;p>&NewLine;<p>Most of the misspellings share a pattern&period; They swap a letter inside the word rather than at its edge&comma; and they fall apart most when the word contains repeats&period;<&sol;p>&NewLine;<table>&NewLine;<thead>&NewLine;<tr>&NewLine;<th>Word queried<&sol;th>&NewLine;<th>What the AI Overview returned<&sol;th>&NewLine;<th>Correct answer<&sol;th>&NewLine;<&sol;tr>&NewLine;<&sol;thead>&NewLine;<tbody>&NewLine;<tr>&NewLine;<td>Google<&sol;td>&NewLine;<td><strong>Two P&&num;8217&semi;s<&sol;strong><&sol;td>&NewLine;<td>One P<&sol;td>&NewLine;<&sol;tr>&NewLine;<tr>&NewLine;<td>poop<&sol;td>&NewLine;<td>One R<&sol;td>&NewLine;<td>Zero R&&num;8217&semi;s<&sol;td>&NewLine;<&sol;tr>&NewLine;<tr>&NewLine;<td>journalism<&sol;td>&NewLine;<td>j-o-u-r-n-a-d-i-s-m<&sol;td>&NewLine;<td>j-o-u-r-n-a-l-i-s-m<&sol;td>&NewLine;<&sol;tr>&NewLine;<tr>&NewLine;<td>Trump<&sol;td>&NewLine;<td>t-r-p-u-m<&sol;td>&NewLine;<td>t-r-u-m-p<&sol;td>&NewLine;<&sol;tr>&NewLine;<&sol;tbody>&NewLine;<&sol;table>&NewLine;<p>The journalism answer is the cleanest illustration of what is going wrong&period; Google&&num;8217&semi;s AI claimed the word contains two D&&num;8217&semi;s&comma; then printed a version with one D substituted for the L&period; Both the count and the spelling were wrong&comma; and they were wrong in different directions inside the same response&period;<&sol;p>&NewLine;<figure class&equals;"wp-block-image aligncenter featured-image" style&equals;"margin&colon;1&period;5em auto&semi;text-align&colon;center&semi;"><img class&equals;"aligncenter" src&equals;"https&colon;&sol;&sol;budgyapp&period;com&sol;wp-content&sol;uploads&sol;2026&sol;05&sol;why-google-ai-overviews-misspell-words-like-google-and-strawberry-explained&period;webp" alt&equals;"Why Google AI Overviews misspell words like Google and strawberry explained&period;" style&equals;"width&colon;100&percnt;&semi;max-width&colon;800px&semi;height&colon;auto&semi;border-radius&colon;8px&semi;display&colon;block&semi;margin&colon;0 auto&semi;" &sol;><figcaption style&equals;"text-align&colon;center&semi;font-size&colon;0&period;85em&semi;color&colon;&num;888&semi;margin-top&colon;0&period;5em&semi;">Why Google AI Overviews misspell words like Google and strawberry explained&period;<&sol;figcaption><&sol;figure>&NewLine;<h2>Why Transformers Can&&num;8217&semi;t See Letters<&sol;h2>&NewLine;<p>AI language models do not read text the way a child learning to spell does&period; Before any computation happens&comma; the input passes through a tokenizer&comma; a program that chops words into recurring fragments and converts each fragment into a number&period;<&sol;p>&NewLine;<h3>From Letters to Numbers<&sol;h3>&NewLine;<p>Most production models&comma; including the systems powering Google&&num;8217&semi;s AI Overviews&comma; use a method called <strong>byte-pair encoding<&sol;strong>&period; The technique builds its vocabulary by scanning huge volumes of training text&comma; merging the most common letter pairs into tokens&comma; then merging the most common pairs of tokens into longer fragments&comma; and so on&period; The word &&num;8220&semi;strawberry&&num;8221&semi; becomes the three tokens &&num;8220&semi;st&&num;8221&semi;&comma; &&num;8220&semi;raw&&num;8221&semi; and &&num;8220&semi;berry&&num;8221&semi; in OpenAI&&num;8217&semi;s GPT-4o tokenizer&period; The model never sees the eleven characters&period; It sees three numbers&period;<&sol;p>&NewLine;<p>Matthew Guzdial&comma; an assistant professor of computing science at the University of Alberta&comma; summarized the gap for TechCrunch&period;<&sol;p>&NewLine;<blockquote>&NewLine;<p>LLMs are based on this transformer architecture&comma; which notably is not actually reading text&period; What happens when you input a prompt is that it&&num;8217&semi;s translated into an encoding&period; When it sees the word &&num;8216&semi;the&comma;&&num;8217&semi; it has this one encoding of what &&num;8216&semi;the&&num;8217&semi; means&comma; but it does not know about &&num;8216&semi;T&comma;&&num;8217&semi; &&num;8216&semi;H&comma;&&num;8217&semi; &&num;8216&semi;E&period;&&num;8217&semi;<&sol;p>&NewLine;<&sol;blockquote>&NewLine;<h3>What Tokens Throw Away<&sol;h3>&NewLine;<p>That trade-off is deliberate&period; Treating every character as its own input would produce sequences four to five times longer&comma; which would multiply the cost of training and running each query&period; The compromise lets a model fit longer passages into the same compute budget&period; The price it pays is the inability to reliably introspect its own input at the letter level&period; Ask the model to count R&&num;8217&semi;s in strawberry and it must reconstruct&comma; from the meanings of &&num;8220&semi;st&&num;8221&semi;&comma; &&num;8220&semi;raw&&num;8221&semi; and &&num;8220&semi;berry&&num;8221&semi;&comma; a property the architecture removed at step one&period;<&sol;p>&NewLine;<h2>The Strawberry Problem Has a Long Tail<&sol;h2>&NewLine;<p>The pattern has been documented long enough to have a nickname&period; AI researchers call it the <strong>strawberry problem<&sol;strong>&comma; after a question that became a running joke&colon; ask a model how many R&&num;8217&semi;s are in strawberry and the most common answer is two&comma; not three&period; The error survived multiple major model releases through 2024 and 2025&comma; and it now sits inside the system Google serves to its largest audience&period;<&sol;p>&NewLine;<h3>What the Counting Study Found<&sol;h3>&NewLine;<p>A <a href&equals;"https&colon;&sol;&sol;arxiv&period;org&sol;html&sol;2412&period;18626v1" target&equals;"&lowbar;blank" rel&equals;"noopener">December 2024 study on letter counting in language models<&sol;a> tested thirteen popular LLMs and found that most failed on words where a letter appeared more than twice&period; The strongest correlation was not with how often the word appeared in training data&comma; the authors wrote&comma; but with the complexity of the counting operation itself&period; Models could recognize each letter in isolation&period; They could not reliably tally repeats inside a single token&period;<&sol;p>&NewLine;<p>Sheridan Feucht&comma; a PhD student at Northeastern University who studies how large language models build internal representations&comma; was not optimistic about a clean solution&period; &&num;8220&semi;It&&num;8217&semi;s kind of hard to get around the question of what exactly a &&num;8216&semi;word&&num;8217&semi; should be for a language model&comma;&&num;8221&semi; she told TechCrunch&comma; adding that even if researchers agreed on &&num;8220&semi;a perfect token vocabulary&comma; models would probably still find it useful to chunk things even further&period;&&num;8221&semi; Her conclusion&colon; &&num;8220&semi;My guess would be that there&&num;8217&semi;s no such thing as a perfect tokenizer due to this kind of fuzziness&period;&&num;8221&semi;<&sol;p>&NewLine;<h3>The Token-Free Alternatives<&sol;h3>&NewLine;<p>Alternative architectures exist on the research bench&period; <a href&equals;"https&colon;&sol;&sol;arxiv&period;org&sol;pdf&sol;2401&period;13660" target&equals;"&lowbar;blank" rel&equals;"noopener">MambaByte&comma; a token-free state space model published in 2024<&sol;a>&comma; processes raw bytes without any tokenizer at all and matches the performance of subword transformers on several benchmarks while staying more robust to corrupted input&period; Character-level models like CANINE and Charformer have shown similar strengths&period; None of them powers a consumer search engine that serves billions of queries a day&period; The cost and latency advantages of subword tokenization remain too large to give up at production scale&comma; which means the bug stays embedded in the foundation&period;<&sol;p>&NewLine;<h2>Two Billion Users&comma; One Architectural Limit<&sol;h2>&NewLine;<p>The errors would matter less if AI Overviews were a niche feature&period; They are not&period; Google&&num;8217&semi;s AI-generated answer boxes now reach roughly two billion users a month&comma; surface on more than a quarter of U&period;S&period; queries tracked by BrightEdge&comma; and climb past 50&percnt; prevalence on long-tail questions seven words or longer&period;<&sol;p>&NewLine;<p>The same tracking shows AI Overviews are doing what Google promised they would&period; They are keeping users on the search page&period; When an Overview is present&comma; the click-through rate to the top organic result falls sharply&comma; which means the AI-generated text is increasingly the only answer a user reads before moving on&period; Google has continued to position the feature as the centerpiece of its product roadmap&comma; including in the company&&num;8217&semi;s <a href&equals;"https&colon;&sol;&sol;blog&period;google&sol;products-and-platforms&sol;products&sol;search&sol;search-io-2026&sol;" target&equals;"&lowbar;blank" rel&equals;"noopener">Search updates at I&sol;O 2026<&sol;a>&period;<&sol;p>&NewLine;<ul>&NewLine;<li><strong>2 billion<&sol;strong> monthly users now see an AI Overview&comma; Google&&num;8217&semi;s biggest deployment of generative AI by audience reach&period;<&sol;li>&NewLine;<li><strong>25&period;8&percnt;<&sol;strong> of tracked U&period;S&period; searches return an AI Overview&comma; based on a study of 2&period;37 million queries&period;<&sol;li>&NewLine;<li><strong>57&percnt;<&sol;strong> of long-tail queries trigger an AI Overview response&period;<&sol;li>&NewLine;<li><strong>65&percnt;<&sol;strong> drop in organic click-through when an AI Overview is present at the top of the page&period;<&sol;li>&NewLine;<&sol;ul>&NewLine;<p>A misspelled answer at this volume is no longer a quirky screenshot&period; It is the answer most users will accept and move on with&period;<&sol;p>&NewLine;<h2>When Disregard Becomes a Command<&sol;h2>&NewLine;<p>The spelling problem is one symptom of a broader category of failure where AI Overviews treat input the wrong way&period; Separate reports last week confirmed that searching the word &&num;8220&semi;disregard&&num;8221&semi; returned not a dictionary card but the line &&num;8220&semi;Understood&period; Let me know whenever you have a new prompt or question&excl;&&num;8221&semi; Single-word queries like ignore&comma; dismiss&comma; and skip produced the same behavior&period; The AI was reading the search bar as a chat box and obeying the words it found there&period;<&sol;p>&NewLine;<p>Google acknowledged the bug on May 23&period; &&num;8220&semi;We&&num;8217&semi;re aware that AI Overviews are misinterpreting some action-related queries&comma; and we&&num;8217&semi;re working on a fix&comma; which will roll out soon&comma;&&num;8221&semi; a company spokesperson said&period; The dictionary issue was patched within days&period;<&sol;p>&NewLine;<p>Both failures share a root&period; The AI Overview layer interprets every search through natural language processing&comma; then generates free-form text on top of whatever the deterministic search index returned&period; When the input contains a command-shaped word or a question about letters&comma; the language model in the loop responds with what it was trained to do&comma; which is talk&comma; not look up&period;<&sol;p>&NewLine;<p>That structural choice produced AI Overviews&&num;8217&semi; biggest wins&comma; faster answers for messy&comma; conversational queries that classic blue-link search handled badly&period; It also produced the spelling errors&comma; the dictionary glitch&comma; and the 2024 launch incidents when the same system told users to put glue on pizza and eat rocks&period; The successes and the failures come from the same code path&period;<&sol;p>&NewLine;<h2>Frequently Asked Questions<&sol;h2>&NewLine;<h3>Why does Google&&num;8217&semi;s AI Overview misspell words like Google and Trump&quest;<&sol;h3>&NewLine;<p>The model behind AI Overviews does not process individual letters&period; It processes tokens&comma; which are short numeric fragments representing common letter groups&comma; so questions about spelling or letter counts force the model to reconstruct character data it never had direct access to&period; That reconstruction often fails on words with repeated letters or unusual letter sequences&period;<&sol;p>&NewLine;<h3>What is tokenization&comma; and why does it cause this&quest;<&sol;h3>&NewLine;<p>Tokenization is the step that turns text into numbers before a language model sees it&period; The most common method&comma; byte-pair encoding&comma; splits words into recurring sub-word pieces like &&num;8220&semi;st&&num;8221&semi;&comma; &&num;8220&semi;raw&&num;8221&semi; and &&num;8220&semi;berry&&num;8221&semi; for strawberry&period; The model learns the meaning of each piece but loses direct access to the individual characters inside it&comma; which is why counting and spelling tasks are unreliable&period;<&sol;p>&NewLine;<h3>Will Google fix the spelling problem&quest;<&sol;h3>&NewLine;<p>Google has said it is working on the specific issue and patched the related &&num;8220&semi;disregard&&num;8221&semi; bug within days&period; The underlying limitation is harder&period; Researchers including Sheridan Feucht at Northeastern University have said there is likely no perfect tokenizer that removes the trade-off&comma; so the surface symptoms can be reduced but the structural problem will keep producing edge cases&period;<&sol;p>&NewLine;<h3>How can I turn off AI Overviews in Google Search&quest;<&sol;h3>&NewLine;<p>Yes&period; Adding the modifier &&num;8220&semi;-ai&&num;8221&semi; to a query&comma; or appending an apostrophe and a unique word&comma; has been reported to suppress the AI Overview box on most searches&period; Switching to Google Search&&num;8217&semi;s Web view from the tools menu also removes the Overview layer and shows only the traditional ranked links&period;<&sol;p>&NewLine;<h3>Do other AI chatbots have the same problem&quest;<&sol;h3>&NewLine;<p>Yes&period; OpenAI&&num;8217&semi;s GPT-4o&comma; Anthropic&&num;8217&semi;s Claude&comma; and Meta&&num;8217&semi;s Llama models all use sub-word tokenizers&comma; and all of them have been documented failing the same letter-counting and spelling tests at various points&period; Some chatbots now route counting questions to a code-execution tool that operates on raw characters&comma; which sidesteps the limit rather than fixing it&period;<&sol;p>&NewLine;<h3>How often do AI Overviews appear in Google Search now&quest;<&sol;h3>&NewLine;<p>AI Overviews appear in roughly one in four tracked U&period;S&period; searches&comma; and in more than half of queries that are seven words or longer&comma; according to BrightEdge and other third-party trackers&period; Google has said the feature reaches around two billion users a month&comma; which makes it the largest consumer deployment of generative AI by audience&period;<&sol;p>&NewLine;<p><script type&equals;"application&sol;ld&plus;json">&NewLine;&lbrace;&NewLine; "&commat;context"&colon; "https&colon;&sol;&sol;schema&period;org"&comma;&NewLine; "&commat;type"&colon; "FAQPage"&comma;&NewLine; "mainEntity"&colon; &lbrack;&NewLine; &lbrace;&NewLine; "&commat;type"&colon; "Question"&comma;&NewLine; "name"&colon; "Why does Google's AI Overview misspell words like Google and Trump&quest;"&comma;&NewLine; "acceptedAnswer"&colon; &lbrace;&NewLine; "&commat;type"&colon; "Answer"&comma;&NewLine; "text"&colon; "The model behind AI Overviews does not process individual letters&period; It processes tokens&comma; which are short numeric fragments representing common letter groups&comma; so questions about spelling or letter counts force the model to reconstruct character data it never had direct access to&period; That reconstruction often fails on words with repeated letters or unusual letter sequences&period;"&NewLine; &rcub;&NewLine; &rcub;&comma;&NewLine; &lbrace;&NewLine; "&commat;type"&colon; "Question"&comma;&NewLine; "name"&colon; "What is tokenization&comma; and why does it cause this&quest;"&comma;&NewLine; "acceptedAnswer"&colon; &lbrace;&NewLine; "&commat;type"&colon; "Answer"&comma;&NewLine; "text"&colon; "Tokenization is the step that turns text into numbers before a language model sees it&period; The most common method&comma; byte-pair encoding&comma; splits words into recurring sub-word pieces like 'st'&comma; 'raw' and 'berry' for strawberry&period; The model learns the meaning of each piece but loses direct access to the individual characters inside it&comma; which is why counting and spelling tasks are unreliable&period;"&NewLine; &rcub;&NewLine; &rcub;&comma;&NewLine; &lbrace;&NewLine; "&commat;type"&colon; "Question"&comma;&NewLine; "name"&colon; "Will Google fix the spelling problem&quest;"&comma;&NewLine; "acceptedAnswer"&colon; &lbrace;&NewLine; "&commat;type"&colon; "Answer"&comma;&NewLine; "text"&colon; "Google has said it is working on the specific issue and patched the related 'disregard' bug within days&period; The underlying limitation is harder&period; Researchers including Sheridan Feucht at Northeastern University have said there is likely no perfect tokenizer that removes the trade-off&comma; so the surface symptoms can be reduced but the structural problem will keep producing edge cases&period;"&NewLine; &rcub;&NewLine; &rcub;&comma;&NewLine; &lbrace;&NewLine; "&commat;type"&colon; "Question"&comma;&NewLine; "name"&colon; "How can I turn off AI Overviews in Google Search&quest;"&comma;&NewLine; "acceptedAnswer"&colon; &lbrace;&NewLine; "&commat;type"&colon; "Answer"&comma;&NewLine; "text"&colon; "Yes&period; Adding the modifier '-ai' to a query&comma; or appending an apostrophe and a unique word&comma; has been reported to suppress the AI Overview box on most searches&period; Switching to Google Search's Web view from the tools menu also removes the Overview layer and shows only the traditional ranked links&period;"&NewLine; &rcub;&NewLine; &rcub;&comma;&NewLine; &lbrace;&NewLine; "&commat;type"&colon; "Question"&comma;&NewLine; "name"&colon; "Do other AI chatbots have the same problem&quest;"&comma;&NewLine; "acceptedAnswer"&colon; &lbrace;&NewLine; "&commat;type"&colon; "Answer"&comma;&NewLine; "text"&colon; "Yes&period; OpenAI's GPT-4o&comma; Anthropic's Claude&comma; and Meta's Llama models all use sub-word tokenizers&comma; and all of them have been documented failing the same letter-counting and spelling tests at various points&period; Some chatbots now route counting questions to a code-execution tool that operates on raw characters&comma; which sidesteps the limit rather than fixing it&period;"&NewLine; &rcub;&NewLine; &rcub;&comma;&NewLine; &lbrace;&NewLine; "&commat;type"&colon; "Question"&comma;&NewLine; "name"&colon; "How often do AI Overviews appear in Google Search now&quest;"&comma;&NewLine; "acceptedAnswer"&colon; &lbrace;&NewLine; "&commat;type"&colon; "Answer"&comma;&NewLine; "text"&colon; "AI Overviews appear in roughly one in four tracked U&period;S&period; searches&comma; and in more than half of queries that are seven words or longer&comma; according to BrightEdge and other third-party trackers&period; Google has said the feature reaches around two billion users a month&comma; which makes it the largest consumer deployment of generative AI by audience&period;"&NewLine; &rcub;&NewLine; &rcub;&NewLine; &rsqb;&NewLine;&rcub;&NewLine;<&sol;script><&sol;p>&NewLine;

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending

Exit mobile version