Word Frequency Counter: See What's Actually in Your Text
You wrote 2,000 words and want to know if you're repeating yourself. Or you're analyzing survey responses and need to spot common themes. Or you're doing SEO and want to check keyword density.
This tool counts every word in your text, ranks them by frequency, and shows count and percentage. Sort alphabetically or by frequency. Export the results as CSV.
Runs in your browser. Nothing leaves your machine.
What's actually happening
The tool tokenizes your input by extracting words using a regex pattern that matches sequences of letters and apostrophes. Everything is lowercased first, so "The" and "the" count as the same word. Numbers and punctuation are stripped — they're not words.
Each unique word gets a count and a percentage of total words. The results are displayed in a ranked table, sorted by frequency (most common first) by default.
The tokenization uses \b[a-z']+\b — so contractions like "don't" and "it's" are treated as single words. Hyphenated words like "well-known" get split into "well" and "known" because the hyphen acts as a word boundary.
Using it
Paste your text. Hit Analyze. The results appear in a table with rank, word, count, and percentage. Toggle between frequency and alphabetical sorting. Hit Export CSV to download the data.
When you'd actually reach for this
- You're editing a blog post and want to check for overused words before publishing
- You're analyzing open-ended survey responses to identify common themes without NLP tooling
- You're checking keyword density for SEO — making sure your target keyword appears enough (but not too much)
- You're studying a text for linguistic analysis or language learning
- You're debugging a text generation system and want to verify the output distribution
Why not just Ctrl+F?
Ctrl+F tells you how many times one specific word appears. This tool shows you every word at once, ranked. You don't have to guess which words to search for — the data tells you.
It's the difference between asking "did I use 'however' too much?" and "what did I use too much?" The latter catches things you didn't think to check.
What the numbers actually mean
Count is straightforward — how many times the word appears.
Percentage is count divided by total words. In most English text, you'll see "the" at 5-7%, "and" at 2-3%, and "a" at 2-3%. These are stop words — they're supposed to be frequent. Don't worry about them.
The interesting data is in the content words. If you're writing about "performance" and that word is at 0.3% in a 2,000-word article, it appears about 6 times. That might be fine, or it might mean you're dancing around the topic without using the actual keyword.
For SEO, a keyword density of 1-2% is generally the sweet spot. Below 0.5% and search engines might not associate your page with that keyword. Above 3% and it starts to look like keyword stuffing.
Limitations
No stemming — "run", "runs", "running", and "ran" are counted as four different words. The tool does exact matching, not linguistic analysis. If you need stemming, you need an NLP library.
No stop word filtering — common words like "the", "is", "and" dominate every frequency list. The tool doesn't filter them because what counts as a "stop word" depends on your use case. Ignore the top 10-20 words if you want content word frequencies.
English-centric tokenization — the regex works well for English and similar Latin-script languages. For CJK languages where words aren't separated by spaces, the tool won't produce meaningful results. You'd need a word segmentation library for Chinese or Japanese.
Apostrophes — "don't" is one word. But leading or trailing apostrophes might create artifacts. 'hello' (with smart quotes) might not tokenize the same as hello depending on the quote characters used.
Troubleshooting
The top words are all "the", "and", "is" — that's normal for English text. Content words start around rank 15-20. Mentally skip the stop words, or export to CSV and filter them in a spreadsheet.
My word count doesn't match my document — the tool only counts alphabetic words. Numbers, symbols, and punctuation don't count. Also, hyphenated words are split. "state-of-the-art" counts as four words here.
Contractions are missing — they're there, just tokenized as single words. "don't" appears as don't. Check the alphabetical sort to find them more easily.
The percentages don't add up to 100% — they should, approximately. Small rounding differences can occur. If they're way off, you might have a lot of non-word content (numbers, URLs) that was stripped during tokenization.
Export CSV is empty — you need to run the analysis first by hitting the Analyze button. The export only works after results are generated.
What to do with the results
For writing: scan the top content words. If any word appears way more than you expected, you're probably repeating yourself. Swap in synonyms or restructure sentences.
For SEO: check that your target keyword is in the top 20-30 content words with at least 1% density. If it's not there, you're not optimizing for it regardless of what your meta tags say.
For analysis: export the CSV and work with it in a spreadsheet or script. Group related words manually, plot distributions, or feed it into further analysis tooling.