0

I'm trying to compare columns of text where the text columns look something like:

  1. C276 .016" x 1.18" (#) (279.4 lbs each)
  2. C22 .016" x 1.125" (#)

Comparing the above to a list that looks like:

  1. C276 .016" x 1.18" (#) (279.4 lbs each)
  2. 302SS .055, 1/2" OD

I am not looking for exact matches, rather trying to find strings that are as similar as possible. VLOOKUP hates non-compact text strings and my data has quite a lot of those as well as a whole host of characters. What is a way that I can get around this?

user2578013
  • 73
  • 2
  • 4
  • 12
  • 5
    `I am not looking for exact matches, rather trying to find strings that are as similar as possible.` - look into [Levenstein distance](http://en.wikipedia.org/wiki/Levenshtein_distance). Here is [VBA implementation](http://stackoverflow.com/questions/4243036/levenshtein-distance-in-excel). – Dmitry Pavliv Apr 11 '14 at 15:05
  • is there a more efficient way to do this besides using Levenstein distance? – user2578013 Apr 11 '14 at 15:37
  • It depends on how your values are different, whether there is any pattern? But in general I don't think you can do it in easier way – Dmitry Pavliv Apr 11 '14 at 15:51
  • 1
    I ended up going through with this. I set up a matrix of sorts. X-Axis my dependents, and Y my independents: `=IFERROR(IF(Levenshtein3($BK$1,C10) > 55, "MATCH", 0), 0)` and checked each value in the X row against a Y. I'm going to be changing the Y values and hopefully catch the strings that match. Thanks! – user2578013 Apr 11 '14 at 20:22

1 Answers1

1

It depends how pervasive the "odd characters" are. You might be able to use wildcard characters * ? to approach an acceptable match. Generally there is not an easy way to process "dirty" data through the VLOOKUP formula. Your best solution is to clean up the data.

If that is not possible, then you may have to give up on VLOOKUP formula and use vba code instead. You would have to write code that could parse the string and identify smaller string(s) with which to make a unique match. It can be done, but it takes a lot of work and refinement.

LeftyMaus
  • 360
  • 3
  • 13