0

I would like to merge 2 dataframes based on shared string column, but sometimes that column values differ slightly. Is there a way in pandas to do that

FrameA  
    someNumber  street
0    1          A Street
1    2          B street

FrameA  
    someNumber  street
0    1          A Str
1    2          B st

In above frames I would like to join based on 'street' column, but treat for example, 'A Street' and 'A Str' values as same. It would be good if there is some kind of threshold, like if edit distance between 2 values is say 4, I would like to treat them as same values.

tural
  • 310
  • 4
  • 17
  • 'differ slightly' is relative and subjective. You need to define rules by which something is only slightly different than something else. Panda nor python will not know what you mean by slightly different. – Marcin Sep 28 '15 at 00:12
  • sorry for being vague, I was thinking if edit distance(number of changes(insert, delete , update) to characters to make 1 string look exactly like other) is say less than 5 treat them equally. Hope this explanation is clear enough – tural Sep 28 '15 at 00:26

0 Answers0