Frequency Analysis
Free letter frequency analysis tool. Useful for breaking classical ciphers and cryptograms, detecting language, etc. Just paste your text and click the Analyze button. See more instructions below the tool.
Frequency Analysis Tool
Options
Results
Code | Count | Frequency |
---|
Total Count:
Index of Coincidence (non-normalized):
Comparison
Code | Frequency |
---|
Index of Coincidence (non-normalized):
How To Use It
This frequency analysis tool can analyze unigrams (single letters), bigrams (two-letters-groups, also called digraphs), trigrams (three-letter-groups, also called trigraphs), or longer.
Unigram analysis
- Set N-gram size to 1.
- If you are analyzing polyalphabetic substitution Ciphers (for example Vigenère), you can use different step sizes (representing different key lengths) and offsets.
Polygram analysis (bigram, trigram or higher)
- Set N-gram size to the number of letters per group (2 for bigrams, 3 for trigrams, etc).
- For digraph ciphers (Playfair, Bifid, Four-square, etc), the step size should be 2 and offset 0.
- For the Trifid cipher, the step size should be 3 and offset 0.
Even for single-letter monoalphabetic substitution ciphers, a polygram analysis can be useful to detect common trigrams (like the). Set the step size to 1.
Options
Preserve Casing
This will make uppercase and lowercase letters differ. It should only be enabled for ciphers where the case matters, for instance the ROT47 cipher.
Keep Spaces & Non-Letters
This will keep any characters that are not letters.
Remove Accents
This will remove accents from letters. It will for example make the words Vigenère and Vigenere equal. This option is rarely used.
Step Size
This determines that number of positions between the starts of the codes. For example let's take bigrams from this text:
ABCDEFGHIJKLMNOPQR
With a step size of 1 we would get:
AB BC CD DE EF FG GH, etc
With a step size of 2 we would get:
AB CD EF GH IJ KL MN, etc
Offset
This determines at which position to take the first code. For example let's take bigrams with a step size of 2 from this text:
ABCDEFGHIJKLMNOPQR
With an offset of 0 we would get:
AB CD EF GH IJ KL MN, etc
With an offset of 1 we would get:
BC DE FG HI JK LM NO, etc
Index of Coincidence
The Index of Coincidence is a statistical measure that can help identify cipher type and language used. Texts written in a natural language (English, or other) usually have an index of coincidence that represents that language. If the letters are changed, as in a monoalphabetic substitution cipher, the index of coincidence remains the same. Also the same is true for transposition ciphers.
A non-normalized Index of Coincidence is used because the tool should be useful for any language. If you want the normalized index of coincidence, you should multiply with the number of letters in the alphabet of the language (26 for English, 27 for Spanish, etc).
See also: Code-Breaking overview | Binary analysis | Cipher identifier | Hex analysis | Text analysis