1

I have an already existing string of manually mapped geographic regions as follows:

Czech Republic-Construction     Emerging      CEEMEA            CEEMEA
Czech Republic-Residential      Emerging      CEEMEA            CEEMEA
Czech-Slovakia                  Emerging      CEEMEA            CEEMEA
Daiichi Sankyo US               Developed     North America     North America
Dailian                         Emerging      China             Asia
Daimaru                         Developed     Japan             Japan
Dairy products                  Other         Other             Other
Dalian                          Emerging      China             Asia

So basically as you can see, I am mapping such regions to proper geographic locations and companies if any to 'Other'. The new regions that I encounter, have spell mistakes, so i use a set of algorithms to check if I have encountered some strings which are close enough and already mapped, if so, i copy the mapping to the new regions.

The following is the way I have used a set of algorithms.

//Levenshtein-Distance
if(LevenshteinDistance == 1) 
    Match string to existing entry.
else if(LevenshteinDistance == 2)
    if(Jaro-Winkler > 0.85) 
        Match string to existing entry.
else if(LevenshteinDistance == 3)
    if(WildCardMatching)
        if(jaro-Winkler)
            Match String to existing entry.
        else
            Add String to List for Manual Mapping.
else
    Add String to List for Manual Mapping.

Wild Card Matching Algorithm:- http://www.geeksforgeeks.org/wildcard-character-matching/

Jaro-Winkler Algorithm:- Jaro–Winkler distance algorithm in C#

My question is even after this, i can still find entries that are mapped wrong, Eg:- Labor and Gabon. Is there a way to add more algorithm or change the way i am currently using these algorithms to make a better matching flow?

Thank you for any help.

Community
  • 1
  • 1
Jay Nirgudkar
  • 426
  • 4
  • 18
  • Could you count number of occurrences of every word? If a word has three or more occurrences you can asume it isn't wrong. – David Pérez Cabrera Jun 30 '15 at 20:48
  • Thats actually a good solution, only problem is, if i get a region "USA, Africa, Asia, Latin-America" and i Already have something like "USA, Asia, Africa"(not Latin America), which is mapped to North America, Asia, Africa.. but i need to add Latin-America to the mapping as well for the new region.. in this case, your logic might not work.. – Jay Nirgudkar Jul 01 '15 at 09:47
  • but thanks... it will atleast help me get atleast a few more mappings done right and reduce my manual work.. :) – Jay Nirgudkar Jul 01 '15 at 09:48

0 Answers0