Phonetic String Matching : Levenshtein Distance

In the last article I discussed an algorithm for creating Soundex codes. In this article I want to show another algorithm called the Levenshtein Distance algorithm or as otherwise known the Edit Distance algorithm. The Levenshtein Distance algorithm is strictly a phonetic algorithm but it calculates how many edits you need to do to turn string A into String B. This can be illustrated in the diagram below.

Levenshtein Distance Calculated the Number of Changes from One String to Another.

Levenshtein Distance Calculated the Number of Changes from One String to Another.

In the example above, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

      1. kitten → sitten (substitution of “s” for “k”)
      2. sitten → sittin (substitution of “i” for “e”)
      3. sittin → sitting (insertion of “g” at the end).

Levenshtein Distance

Computing the Levenshtein distance is based on the observation that if we reserve a matrix to hold the Levenshtein distances between all prefixes of the first string and all prefixes of the second, then we can compute the values in the matrix in a dynamic programming fashion, and thus find the distance between the two full strings as the last value computed.

First up in the implementation is an interface definition.

namespace PhoneticMatch
{
    public interface IStringDistance
    {
        int Distance(string source, string target);
    }
}

Next we have the implementation of the Levenshtien Distance algorithm.

using System;

namespace PhoneticMatch
{
    public class LevenshteinDistance : IStringDistance
    {
        public int Distance(string source, string target)
        {
            if (string.IsNullOrEmpty(source))
            {
                throw new ArgumentNullException("source");
            }

            if (string.IsNullOrEmpty(target))
            {
                throw new ArgumentNullException("target");
            }

            source = source.ToUpper();
            target = target.ToUpper();

            var distance = new int[source.Length + 1, target.Length + 1];

            for (var i = 0; i <= source.Length; i++)
            {
                distance[i, 0] = i;
            }

            for (var j = 0; j <= target.Length; j++)
            {
                distance[0, j] = j;
            }

            for (var j = 1; j <= target.Length; j++)
            {
                for (var i = 1; i <= source.Length; i++)
                {
                    if (source[i - 1] == target[j - 1])
                    {
                        distance[i, j] = distance[i - 1, j - 1];
                    }
                    else
                    {
                        distance[i, j] = Math.Min(Math.Min(
                        distance[i - 1, j] + 1, //a deletion
                        distance[i, j - 1] + 1), //an insertion
                        distance[i - 1, j - 1] + 1 //a substitution
                        );
                    }
                }
            }

            return distance[source.Length, target.Length];
        }
    }
}

I have also created a few basic unit tests for this Levenshtein Distance implementation written using MSTest. This prove that the algorithm works and also shows the basic usage.

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using PhoneticMatch;

namespace PhoneticMatchTests
{
    [TestClass]
    public class LevenshteinDistanceTests
    {
        [TestMethod]
        [ExpectedException(typeof(ArgumentNullException))]
        public void ThrowArgumentNullExceptionIfSourceIsNull()
        {
            IStringDistance distance = new LevenshteinDistance();
            distance.Distance(null, null);
        }

        [TestMethod]
        [ExpectedException(typeof(ArgumentNullException))]
        public void ThrowArgumentNullExceptionIfTargetIsNull()
        {
            IStringDistance distance = new LevenshteinDistance();
            distance.Distance("name", null);
        }

        [TestMethod]
        public void CalculateDistanceOfKittenAndSitten()
        {
            IStringDistance distance = new LevenshteinDistance();
            Assert.AreEqual(3, distance.Distance("kitten", "sitting"));
        }

        [TestMethod]
        public void CalculateDistanceOfKittenAndSittenWithDifferentCases()
        {
            IStringDistance distance = new LevenshteinDistance();
            Assert.AreEqual(3, distance.Distance("Kitten", "siTTing"));
        }
    }
}
Participate with Coding in the Trenches on Facebook

Participate with Coding in the Trenches on Facebook by Click the button above.

2 thoughts on “Phonetic String Matching : Levenshtein Distance

  1. Pingback: Phonetic String Matching : Soundex | Stephen Haunts { Coding in the Trenches }

  2. Pingback: Phonetic String Matching : Metaphone | Stephen Haunts { Coding in the Trenches }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s