Damerau-Levenshtein Distance Calculator | Typo Distance That Counts Transpositions
Enter two strings to get the Damerau-Levenshtein distance, similarity %, and normalized value. Unlike plain Levenshtein, an adjacent two-character swap counts as a single edit, and the four operation types are broken out so you can see exactly how the strings differ.
💡 About This Tool
Typing "recieve" instead of "receive" or "teh" instead of "the" is just two neighboring letters in the wrong order. Plain Levenshtein distance treats that swap as two edits (a delete plus an insert, or two substitutions), which overstates how far apart the strings really are. For keyboard typos and OCR misreads, that extra distance throws off your similarity thresholds.
Damerau-Levenshtein distance fixes this by counting an adjacent transposition as a single operation, so it tracks how humans actually make mistakes. This calculator returns the distance and then decomposes it: how many insertions, deletions, substitutions, and transpositions made up that number. That breakdown lets you tune a fuzzy-search cutoff, prune spell-checker candidates, or validate a deduplication rule against your own data instead of guessing.
🧐 Frequently Asked Questions
How is this different from Levenshtein distance? The set of allowed operations differs. Levenshtein counts insertions, deletions, and substitutions. Damerau-Levenshtein adds the transposition of two adjacent characters as one operation. Turning "ab" into "ba" is distance 2 under Levenshtein but distance 1 under Damerau-Levenshtein.
What does "OSA variant" mean? This tool uses the Optimal String Alignment variant: a Damerau-Levenshtein distance with the rule that no substring is edited more than once. It is lighter to compute and matches the true distance in almost all real cases, though in rare patterns it can report a slightly larger value than the unrestricted Damerau-Levenshtein.
How are similarity and the normalized value computed?
The normalized value is the distance divided by the length of the longer string, ranging from 0 (identical) to 1 (completely different). Similarity is its complement, shown as (1 − normalized) × 100 percent, so strings of unequal length can be compared fairly.
Does it handle Unicode, accents, and emoji? Yes. Comparison runs per character, so accented letters and CJK characters each count as one. Note that some emoji built from surrogate pairs are counted as two units.
Is there an input limit? Each field accepts up to 2000 characters. Because the algorithm runs in time proportional to the product of the two lengths, comparing long passages takes noticeably longer than comparing short words.
📚 Edit-Distance Field Notes
The transposition rule comes from Frederick Damerau, who reported in 1964 that the large majority of human spelling errors are a single insertion, deletion, substitution, or a swap of two adjacent letters. That observation became a foundation for spell checkers and fuzzy search, and it still powers "did you mean" suggestions and autocorrect today. In practice, teams rarely use the raw distance alone: they pair it with the length-normalized value or a fixed threshold, treating a distance of 1–2 as "near-identical" for short names and a normalized value under 0.1 for longer text. Measuring those cutoffs on your own data with a tool like this keeps false matches in check.