I'm testing a metaphone implementation for C# and comparing its results against the built-in metaphone() function from PHP. However, I've come across a bug (which is previously documented in PHP's issue tracker and discussed on a mailing list), but I'm trying to understand the C code behind their bug for my own personal interest.
Basically, according to the metaphone algorithm, most instances of -gh- should be rendered silent. In the specific test case of "wright", I expect (and generate with my own algorithm) a metaphone key of "RT"
"wr" => R
"i" => ignored
"gh" => ignored
"t" => T
Result: RT
However, PHP's metaphone function returns RFT. Clearly, it's converting the -gh- to an F, as if it were at the end of a word (e.g. "rough"), but in the case of the word "wright", this is incorrect, because the -gh- does not come at the end of the word. Looking at the metaphone.c file in the PHP source distribution, I see a few key things:
/* These prevent GH from becoming F */
#define NOGHTOF(c) (ENCODE(c) & 16) /* BDH */
...
/* Go N letters back. */
#define Look_Back_Letter(n) (w_idx >= n ? toupper(word[w_idx-n]) : '\0')
And then on line 342:
case 'G':
if (Next_Letter == 'H') {
if (!(NOGHTOF(Look_Back_Letter(3)) || Look_Back_Letter(4) == 'H')) {
Phonize('F');
skip_letter++;
Can someone help me understand what exactly the NOGHTOF function does and why this code is incorrectly rendering an F for the -gh- in "wright"? I'm not really a C guy, so the code isn't at all clear to me.