0

I'm trying to use a Levenstein algo I found here to clean a huge amount of data. But having trouble implementing it.

I have 100,000 rows of excel data.

One of the columns contains a city name, these have multiple typos (hence levenstein)

I have a list of 1,000,000 cities around the world. I am trying to find the best way to query both sets of data and return the correct spelling of the city for each instance.

Currently I have =IF(Levenshtein(J5,$K$4)<4, $K$4, "No Match")

But this would need to repeated for each of my one million cities individually.

My ideal solution is to add another column, insert a formula and end up with the correct spelling of that cities name.

Community
  • 1
  • 1

1 Answers1

0

This 'Fuzzy Lookup Add-In for Excel' should do the trick :)

http://www.microsoft.com/en-gb/download/details.aspx?id=15011

Tim
  • 756
  • 1
  • 7
  • 12
  • This didn't work. 1. My datasets are too large and excel runs out of memory when I run this 2. It wouldnt do what I wanted anyway, I woukd still need to run a new fuzzy vlookup for every city I wanted to match (which is one million) – Max Holloway Sep 17 '14 at 07:50
  • Yes I was afraid of that, dealing with a 100,000 * 1m iterations is going to jam up Excel whatever you use. Where does the data originate? Do you have access to a proper database such as SQL Server? – Tim Sep 17 '14 at 07:54