If this is American Soundex as defined by the National Archives you're both wrong. American Soundex contains one letter and three numbers, you can't have L2222
nor L2220
. It's L222
.
But let's say they added another number for some reason.
The basic substitution gives L2222
. But you're supposed to collapse adjacent letters with the same numbers (step 3 below) and then pad with zeros if necessary (step 4).
If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
If you have too few letters in your word that you can't assign [four] numbers, append with zeros until there are [four] numbers. If you have more than [4] letters, just retain the first [4] numbers.
Lukasieicz # the original word
L_2_2___22 # replace with numbers, leave the gaps in
L_2_2___2 # apply step 3 and squeeze adjacent numbers
L2220 # apply step 4 and pad to four numbers
We can check how conventional (ie. three number) soundex implementations behave with the shorter Lukacz
which becomes L_2_22
. Following rules 3 and 4, it should be L220
.
The National Archives recommends an online Soundex calculator which produces L220
. So does PostgreSQL and Text::Soundex in both its original flavor and NARA implementations.
$ perl -wle 'use Text::Soundex; print soundex("Lukacz"); print soundex_nara("Lukacz")'
L220
L220
MySQL, predictably, is doing its own thing and returns L200
.
This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second.
In conclusion, you forgot the squeeze step.