I want to code the Metaphone 3 algorithm myself. Is there a description? I know the source code is available for sale but that is not what I am looking for.
6 Answers
Since the author (Lawrence Philips) decided to commercialize the algorithm itself it is more than likely that you will not find description. The good place to ask would be the mailing list: https://lists.sourceforge.net/lists/listinfo/aspell-metaphone
but you can also checkout source code (i.e. the code comments) in order to understand how algorithm works: http://code.google.com/p/google-refine/source/browse/trunk/main/src/com/google/refine/clustering/binning/Metaphone3.java?r=2029
-
2well, this is not the answer i was hoping for, but since you managed to find a (legitimate) link to the source code, i should grant you the bounty :) i am not sure why the source code is for sale if it is also available for free (legally), but since the bounty expires in 10 min. i should figure all that out later and get it to you! :) – necromancer May 15 '12 at 19:44
-
1on second thoughts it seems like the free source code linked in the answer is complete and functional, and quite well documented sufficient to be a precise algorithm, so this is a perfect answer and definitely earned the bounty! :) – necromancer May 15 '12 at 21:09
-
@agksmehx The linked page is asking for subscription, i subscribed and then clicked on archive but it says it is restricted. Can you please tell me how to get the code or share the "free source code" – bjan Feb 06 '13 at 04:38
-
3@bjan here you go: http://code.google.com/p/google-refine/source/browse/trunk/main/src/com/google/refine/clustering/binning/Metaphone3.java?r=2029 -- this link was in the original answer and is completely legit (donated under BSD license) but somehow the author is getting other people to delete the link from the answer. see the "edits" and you will see the link in the original answer. NOT COOL! – necromancer Feb 24 '13 at 20:15
-
@bjan if you did purchase it ask for a refund, if not from the author then from the person who edited out the link, because it was disingenuously withheld from you here! – necromancer Feb 24 '13 at 20:18
-
The algorithm is so filled with special cases, describing it is almost writing it. – AHungerArtist Jan 21 '14 at 15:28
-
i don't see anywhere in the BSD license where it states that any algorithm disclosed in BSD code may be considered to have been declared public domain. the BSD license specifies only that the source file published may be used without restriction as long as the copyright is preserved and there is no implication that the original author endorses a product that it might be used in. – lawrence philips Aug 30 '14 at 20:15
-
copyright protection is a law, and it has been defined as follows: "A copyright is a legal device that gives the creator of a literary, artistic, musical, or other creative work the sole right to publish and sell that work. Copyright owners have the right to control the reproduction of their work, including the right to receive payment for that reproduction. An author may grant or sell those rights to others, including publishers or recording companies. Violation of a copyright is called infringement." if BSD license requires a copyright, this implies that the (c) owner has legal protection – lawrence philips Aug 30 '14 at 20:25
From Wikipedia, the Metaphone algorithm is
Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar [...]
Metaphone 3 specifically
[...] achieves an accuracy of approximately 99% for English words, non-English words familiar to Americans, and first names and family names commonly found in the United States, having been developed according to modern engineering standards against a test harness of prepared correct encodings.
The overview of the algorithm is:
The Metaphone algorithm operates by first removing non-English letters and characters from the word being processed. Next, all vowels are also discarded unless the word begins with an initial vowel in which case all vowels except the initial one are discarded. Finally all consonents and groups of consonents are mapped to their Metaphone code. The rules for grouping consonants and groups thereof then mapping to metaphone codes are fairly complicated; for a full list of these conversions check out the comments in the source code section.
Now, onto your real question:
If you are interested in the specifics of the Metaphone 3 algorithm, I think you are out of luck (short of buying the source code, understanding it and re-creating it on your own): the whole point of not making the algorithm (of which the source you can buy is an instance) public is that you cannot recreate it without paying the author for their development effort (providing the "precise algorithm" you are looking for is equivalent to providing the actual code itself). Consider the above quotes: the development of the algorithm involved a "test harness of [...] encodings". Unless you happen to have such test harness or are able to create one, you will not be able to replicate the algorithm.
On the other hand, implementations of the first two iterations (Metaphone and Double Metaphone) are freely available (the above Wikipedia link contains a score of links to implementations in various languages for both), which means you have a good starting point in understanding what the algorithm is about exactly, then improve on it as you see fit (e.g. by creating and using an appropriate test harness).

- 28,265
- 3
- 46
- 55
-
6Since the code costs $40.00 from [Amorphics](http://www.amorphics.com/buy_metaphone3.html), it is very far from outrageously priced. The licence terms prohibit the redistribution of the source code; if you find the source code on the web, it is probably not legitimately obtained. You can however build software using the source code and distribute the compiled programs without many restrictions. IANAL; that's my quick interpretation of what it says on the licence page that's a link on the URL. – Jonathan Leffler May 13 '12 at 17:05
-
@JonathanLeffler the incentive for the seller is to make the source difficult to understand, so the seller might as well be selling binaries. i am hesitant to spend money on effectively closed source software. i have yet to encounter any reasonable source code from which an algorithm cannot be derived. if this is a machine learnt source then the secret sauce is a bunch of feature weights, which would be trivial to extract unless the source is obfuscated. either way, it is a conflict-ridden way of distributing source and thus my request for the algorithm itself. – necromancer May 14 '12 at 03:55
-
@agksmehx - I havenot read the EULA, but I can imagine it forbids anyone having the source to distribute it. So you will not get an answer from people who have the source. Those who do not have the source will not be able to answer you because of what I outlined in my answer. So your only option seems to buy the source yourself. But why do you want the Metaphone 3 specifically? Couldn't you use the other two, freely available versions, just as well? – Attila May 14 '12 at 11:21
-
I have specifically said in the bounty that I am not looking for code. There is no reason an algorithm cannot be described, so I believe you are wrong that I cannot get an answer from those who have the source. See my previous comment regarding the expected quality of the source. – necromancer May 14 '12 at 19:26
-
Yes, I can and will fall back, but I thought it is wrong to have the general community be denied an algorithm (not code). Take a moment to think about most other algorithms in the industry and see how common is it for the algorithm to be suppressed. For example, take the best query optimizer, the source may be restricted but the algorithm is not. – necromancer May 14 '12 at 19:28
-
There's only a very fine line between a mathematical algorithm and the computer program code that implements it. Very fine. – Lightness Races in Orbit May 19 '14 at 20:53
-
The link by @Bo now refers to (now defucnt) project entire source code.
Hence here is the new link with direct link to Source code for Metaphone 3 https://searchcode.com/codesearch/view/2366000/
by Lawrence Philips
Metaphone 3 is designed to return an approximate phonetic key (and an alternate * approximate phonetic key when appropriate) that should be the same for English * words, and most names familiar in the United States, that are pronounced similarly. * The key value is not intended to be an exact phonetic, or even phonemic, * representation of the word. This is because a certain degree of 'fuzziness' has * proven to be useful in compensating for variations in pronunciation, as well as * misheard pronunciations. For example, although americans are not usually aware of it, * the letter 's' is normally pronounced 'z' at the end of words such as "sounds".
The 'approximate' aspect of the encoding is implemented according to the following rules:
* * (1) All vowels are encoded to the same value - 'A'. If the parameter encodeVowels * is set to false, only initial vowels will be encoded at all. If encodeVowels is set * to true, 'A' will be encoded at all places in the word that any vowels are normally * pronounced. 'W' as well as 'Y' are treated as vowels. Although there are differences in * the pronunciation of 'W' and 'Y' in different circumstances that lead to their being * classified as vowels under some circumstances and as consonants in others, for the purposes * of the 'fuzziness' component of the Soundex and Metaphone family of algorithms they will * be always be treated here as vowels.
* * (2) Voiced and un-voiced consonant pairs are mapped to the same encoded value. This means that:
* 'D' and 'T' -> 'T'
* 'B' and 'P' -> 'P'
* 'G' and 'K' -> 'K'
* 'Z' and 'S' -> 'S'
* 'V' and 'F' -> 'F'
* * - In addition to the above voiced/unvoiced rules, 'CH' and 'SH' -> 'X', where 'X' * represents the "-SH-" and "-CH-" sounds in Metaphone 3 encoding.

- 57,590
- 26
- 140
- 166

- 6,364
- 11
- 69
- 117
-
1the defunct project has moved to github: https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/clustering/binning/Metaphone3.java – necromancer Jul 03 '16 at 05:39
-
-
* 'D' and 'T' -> 'T' Is this on a special mode? I don't see this happening. example: scooby doo – jeffry copps Aug 08 '19 at 04:35
-
I thought it is wrong to have the general community be denied an algorithm (not code)
I am selling source, so the algorithm is not hidden. I am asking $40.00 for a copy of the source code, and asking other people who are charging for their software or services that use Metaphone 3 to pay me a licensing fee, and also asking that the source code not be distributed by other people (except for an exception I made for Google Refine - i can only request that you do not redistribute the copy of Metaphone 3 found there separately from the Refine package.)

- 188
- 3
-
Hi Lawrence! #1. good work! #2. my apologies for suspecting that the source code might be obfuscated -- it is in fact, excellently documented! #3. it is your prerogative to restrict source code. #4. if there is a technique beyond source code that is indeed protected, it is your prerogative to protect it via a patent; you somewhat understandably haven't which is either because the patent system is dysfunctional or there is nothing patentable beyond the source (an algorithm typically a mathematical discovery). #5. far more than 2000 hrs has been put into a free s/w ecosystem around your code. – necromancer May 18 '12 at 23:31
-
given all that, it is a reasonable and perfectly legal request on my part. i am glad you contributed your code to google refine and i certainly intend to comply with all licensing terms. given how strongly you feel it is beyond a typical algorithm, i would encourage getting a patent on the novel techniques and license it all over the place. again i cannot overemphasize how much the entire community contributes to the free software world, from gcc to tomcat to linux. that said i believe in capitalism but it is fair game to try to avoid the $ charge (in my case on purely philosophical grounds) – necromancer May 18 '12 at 23:36
-
Hi agks mehx - thanks - i appreciate your kind comments! Actually, it turns out that as of a few years ago the US Patent Office has decided that algorithms are no longer patentable. I don't feel that the algorithm is 'beyond' a normal algorithm; rather, my point is that the algorithm, such as it is, is merely software like any other program, and as such is just a product that I developed in the hopes that i could start a business around it. Having said that, I decided to release it as source so that engineers would be able to understand how it works and easily modify and (hopefully) improve it – lawrence philips May 18 '12 at 23:54
-
Hi Lawrence, I feel bad about any impact on your business, and sorry if my comments came out harsh. I certainly do not intend to publicize it and I am pretty sure many engineers will pay for your software as a thank-you even if there is a free source (I am sure I would if I had a lot more money!). I am not even sure if I will use it in a real product -- sometimes you find out only much later after prototyping. Good luck with the business and I most definitely and deeply appreciate your contribution!! Again I do not intend to distribute or publicize it -- I'm just tinkering and prototyping. – necromancer May 19 '12 at 00:03
Actually Metaphone3 is an algorithm with many very specific rules being a result of some test cases analysis. So it's not only a pure algorithm but it comes with extra domain knowledge. To obtain these knowledge and specific rules the author needed to put in a great effort. That's why this algorithm is not open-source.
There is an alternative anyway which is open-source: Double Metaphone. See here: https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html

- 1,033
- 11
- 16
-
2That's nice to know. Apparently Double Metaphone too is by the same author but an older version of Metaphone 3. – necromancer Mar 03 '19 at 04:29
This is not a commercial post and I have no relationship with the owner but it is worth saying that an implementation of Metaphone3 is available as commercial software from its creator amporphics.com. It looks like his personal store. It is a Java app but I bought the Python version and it works fine.
The Why Metaphone3? page says:
One common solution to spelling variation is the database approach. Some very impressive work has been done accumulating personal name variations from all over the world. (Of course, we are always very pleased when the companies that retail these databases advertise that they also use some version of Metaphone to improve their flexibility :-) )
But - there are some problems with this approach:
- They only work well until they encounter a spelling variation or a new word or name that is not already in their database.
Then they don't work at all.
Metaphone 3 is an algorithmic approach that will deliver a phonetic lookup key for anything you enter into it.
- Personal names, that is, first names and family names, are not the same as company names. In fact, the name of a company or agency may contain words of any kind, not just names. Database solutions usually don't cover possible spelling variations, or for that matter misspellings, for regular 'dictionary' words. Or if they do, not very thoroughly.
Metaphone 3 was developed to account for all spelling variations commonly found in English words, first and last names found in the United States and Europe, and non-English words whose native pronunciations are familiar to Americans. It doesnt care what kind of a word you are trying to match.
For what it is worth, we licensed the code since it is affordable and it is easy to use. I can't speak as to performance yet. There are good alternatives on PyPi but I can't find them at the moment.

- 4,824
- 5
- 41
- 62