-1

I'm trying to use fuzzy lookup to match a list of correct names with a set of "dirty" names. But apparently vba only uses one core of my processors and it takes too much time because I am using it on at least 5000 names.

Here's a link to the fuzzy code: https://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html#post955137

I also researched about "multi-threading" solutions for VBA and I found that there's no native way of doing it but someone found made an alternative using some scripts.

Here's the link for the multithreading vba script tool: https://analystcave.com/excel-vba-multithreading-tool/

Now, all I need to do is to integrate the lookup code to this multithreading script so that it will speed up the processing of this function. I am assuming that this is possible right?

Can someone help me with this? I only learned VBA through googling and reading other codes but this vba multithread tool is quite complicated for a beginner like me.

Thank you very much!

Community
  • 1
  • 1
kimminho25
  • 1
  • 1
  • 1
  • 4
  • 1
    I've had a look at the Fuzzy code. For 5000 executions it runs pretty fast. Clearly the longer the string the longer the run time, but I'm assuming by "names" you mean pretty short strings. For strings of 100 characters, 5000 executions or Algorithm 2 takes about 1 to 2s (depending on which string is longer) on my hardware. Or are you doing a O(n2) loop within loop? (_that_ would be slow!) So I would suspect there is something else about your code that is taking too long. Please update your Q with the code you are using, and include some sample data – chris neilsen May 24 '18 at 23:22

2 Answers2

0

I'm not qualified to address the multithreading, but about your speed issue: are you running the code directly on the spreadsheet?

A better method is to import the entire table or range into an Array, and run the code on it there while it's in computer memory. It runs MUCH faster there. Then paste the results into the spreadsheet.

Here's some info on pulling the data into an array:

Creating an Array from a Range in VBA

http://www.cpearson.com/excel/ArraysAndRanges.aspx

You'll have to fiddle with the rest of your code, but basically you'll treat the array as if it were a table.

Shawn V. Wilson
  • 1,002
  • 3
  • 17
  • 42
0

Below is an excerpt from Microsoft website. I believe their C# based add-in Fuzzy Lookup for MS-Excel is multi-threading based and much faster than the code you provide. Why to re-invent the wheel when we have a better option available.

The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel. It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data. For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match. While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages. The following libraries are required and will be installed if necessary: .NET 4.5 VSTO 4.0

jainashish
  • 4,702
  • 5
  • 37
  • 48