In the last article I discussed an algorithm for creating Soundex codes. In this article I want to show another algorithm called the Levenshtein Distance algorithm or as otherwise known the Edit Distance algorithm. The Levenshtein Distance algorithm is strictly a phonetic algorithm but it calculates how many edits you need to do to turn string A into String B. This can be illustrated in the diagram below.

In the example above, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

**k**itten →**s**itten (substitution of “s” for “k”)- sitt
**e**n → sitt**i**n (substitution of “i” for “e”) - sittin → sittin
**g**(insertion of “g” at the end).

## Levenshtein Distance

Computing the Levenshtein distance is based on the observation that if we reserve a matrix to hold the Levenshtein distances between all prefixes of the first string and all prefixes of the second, then we can compute the values in the matrix in a dynamic programming fashion, and thus find the distance between the two full strings as the last value computed.

First up in the implementation is an interface definition.

namespace PhoneticMatch { public interface IStringDistance { int Distance(string source, string target); } }

Next we have the implementation of the Levenshtien Distance algorithm.

using System; namespace PhoneticMatch { public class LevenshteinDistance : IStringDistance { public int Distance(string source, string target) { if (string.IsNullOrEmpty(source)) { throw new ArgumentNullException("source"); } if (string.IsNullOrEmpty(target)) { throw new ArgumentNullException("target"); } source = source.ToUpper(); target = target.ToUpper(); var distance = new int[source.Length + 1, target.Length + 1]; for (var i = 0; i <= source.Length; i++) { distance[i, 0] = i; } for (var j = 0; j <= target.Length; j++) { distance[0, j] = j; } for (var j = 1; j <= target.Length; j++) { for (var i = 1; i <= source.Length; i++) { if (source[i - 1] == target[j - 1]) { distance[i, j] = distance[i - 1, j - 1]; } else { distance[i, j] = Math.Min(Math.Min( distance[i - 1, j] + 1, //a deletion distance[i, j - 1] + 1), //an insertion distance[i - 1, j - 1] + 1 //a substitution ); } } } return distance[source.Length, target.Length]; } } }

I have also created a few basic unit tests for this Levenshtein Distance implementation written using MSTest. This prove that the algorithm works and also shows the basic usage.

using System; using Microsoft.VisualStudio.TestTools.UnitTesting; using PhoneticMatch; namespace PhoneticMatchTests { [TestClass] public class LevenshteinDistanceTests { [TestMethod] [ExpectedException(typeof(ArgumentNullException))] public void ThrowArgumentNullExceptionIfSourceIsNull() { IStringDistance distance = new LevenshteinDistance(); distance.Distance(null, null); } [TestMethod] [ExpectedException(typeof(ArgumentNullException))] public void ThrowArgumentNullExceptionIfTargetIsNull() { IStringDistance distance = new LevenshteinDistance(); distance.Distance("name", null); } [TestMethod] public void CalculateDistanceOfKittenAndSitten() { IStringDistance distance = new LevenshteinDistance(); Assert.AreEqual(3, distance.Distance("kitten", "sitting")); } [TestMethod] public void CalculateDistanceOfKittenAndSittenWithDifferentCases() { IStringDistance distance = new LevenshteinDistance(); Assert.AreEqual(3, distance.Distance("Kitten", "siTTing")); } } }

Pingback: Phonetic String Matching : Soundex | Stephen Haunts { Coding in the Trenches }

Pingback: Phonetic String Matching : Metaphone | Stephen Haunts { Coding in the Trenches }