I'm currently writing a script tasked with going through tens of thousands of rows of account information and cleaning mistyped addresses, as well as printing out reports on how the address was cleaned. Currently the biggest source of unclean addresses is mistyped street-names (it's amazing how many ways you can spell a street-name). In any case, currently my script grabs the input street-name and performs a series of edits specific to the Norwegian language (v.
becomes vegen
, gt.
becomes gata
etc.) and searches for the street-name in a ~2 million row database of addresses. If it doesn't find a match it proceeds to split off the latter half of the street-name and replacing it with a wildcard. It tries out different variations of the wildcard search.
Anyway, my question is:
Does MySQL include anything that could make this easier for me? I recall hearing mention of a "search" function in MySQL that finds the cells in a column with the most matching characters or something. In the cases where my wild-card search fails it would be a great tool to have.
Anything else that would help with finding matches to mistyped addresses would be great.