search

Found

info Overview

Compute the Damerau-Levenshtein distance, similarity %, and normalized value for two strings, with an insert, delete, substitute, and transpose breakdown.

📘 How to Use

  1. Type or paste your two strings into String A and String B
  2. Toggle case sensitivity and whitespace trimming as needed
  3. Read the edit distance, similarity %, and normalized value
  4. Check the insert / delete / substitute / transpose breakdown

Damerau-Levenshtein Distance Calculator

0 / 2000 Characters
0 / 2000 Characters

Edit distance

0 edits

Similarity

100.0 %

Normalized

0.000 ratio

Operations breakdown

Insertions

0

Deletions

0

Substitutions

0

Transpositions

0

※ Uses the Optimal String Alignment (OSA) variant; the same substring is not edited more than once.

※ Inputs are capped at 2000 characters; distance is measured in Unicode code points.

Article

Damerau-Levenshtein Distance Calculator | Typo Distance That Counts Transpositions

Enter two strings to get the Damerau-Levenshtein distance, similarity %, and normalized value. Unlike plain Levenshtein, an adjacent two-character swap counts as a single edit, and the four operation types are broken out so you can see exactly how the strings differ.

💡 About This Tool

Typing "recieve" instead of "receive" or "teh" instead of "the" is just two neighboring letters in the wrong order. Plain Levenshtein distance treats that swap as two edits (a delete plus an insert, or two substitutions), which overstates how far apart the strings really are. For keyboard typos and OCR misreads, that extra distance throws off your similarity thresholds.

Damerau-Levenshtein distance fixes this by counting an adjacent transposition as a single operation, so it tracks how humans actually make mistakes. This calculator returns the distance and then decomposes it: how many insertions, deletions, substitutions, and transpositions made up that number. That breakdown lets you tune a fuzzy-search cutoff, prune spell-checker candidates, or validate a deduplication rule against your own data instead of guessing.

🧐 Frequently Asked Questions

How is this different from Levenshtein distance? The set of allowed operations differs. Levenshtein counts insertions, deletions, and substitutions. Damerau-Levenshtein adds the transposition of two adjacent characters as one operation. Turning "ab" into "ba" is distance 2 under Levenshtein but distance 1 under Damerau-Levenshtein.

What does "OSA variant" mean? This tool uses the Optimal String Alignment variant: a Damerau-Levenshtein distance with the rule that no substring is edited more than once. It is lighter to compute and matches the true distance in almost all real cases, though in rare patterns it can report a slightly larger value than the unrestricted Damerau-Levenshtein.

How are similarity and the normalized value computed? The normalized value is the distance divided by the length of the longer string, ranging from 0 (identical) to 1 (completely different). Similarity is its complement, shown as (1 − normalized) × 100 percent, so strings of unequal length can be compared fairly.

Does it handle Unicode, accents, and emoji? Yes. Comparison runs per character, so accented letters and CJK characters each count as one. Note that some emoji built from surrogate pairs are counted as two units.

Is there an input limit? Each field accepts up to 2000 characters. Because the algorithm runs in time proportional to the product of the two lengths, comparing long passages takes noticeably longer than comparing short words.

📚 Edit-Distance Field Notes

The transposition rule comes from Frederick Damerau, who reported in 1964 that the large majority of human spelling errors are a single insertion, deletion, substitution, or a swap of two adjacent letters. That observation became a foundation for spell checkers and fuzzy search, and it still powers "did you mean" suggestions and autocorrect today. In practice, teams rarely use the raw distance alone: they pair it with the length-normalized value or a fixed threshold, treating a distance of 1–2 as "near-identical" for short names and a normalized value under 0.1 for longer text. Measuring those cutoffs on your own data with a tool like this keeps false matches in check.