In the previous parts of this series I discussed the Soundex and Levenshtein Distance algorithms for phonetic string matching. In this next part of the series I want to introduce the Metaphone algorithm.
Metaphone is a phonetic algorithm for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar. As with Soundex, similar sounding words should share the same keys.
In the last article I discussed an algorithm for creating Soundex codes. In this article I want to show another algorithm called the Levenshtein Distance algorithm or as otherwise known the Edit Distance algorithm. The Levenshtein Distance algorithm is strictly a phonetic algorithm but it calculates how many edits you need to do to turn string A into String B. This can be illustrated in the diagram below.
In the example above, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result.
Typically in our applications we will try to compare literal strings, which only gets you so far. Sometimes we need to be a little cleverer and compare strings based on how they sound as opposed to how they are spelt. In this series of articles I want to show some implementations for some different phonetic comparison algorithms that you can use in your applications. Feel free to take the code from these articles and use them in your software.