I have a general question. I try to do string matching between data frames in R. My strings have the format below:
"COOL FOODS LTD 222 HIGH ST LONDON ABC123"
I would like to iterate over other data frames and would like my code to find matches between the above string and the strings below:
"222 HIGH ST LONDON ABC123 COOL FOODS LTD "
"HIGH LTD ST 222 LONDON COOL ABC123 FOODS "
"COOL FOODS LTD 222 HIGH ST LONDON UNITED KINGDOM ABC123"
I tried adist
, but the similarity scores I get using that method are not very good when parts of the string are rearranged or when the inserted part is long (as per the examples).
I thought about splitting my strings by white spaces, but I'm not sure how to then do the matching and comparing efficiently with many data frames.
I would be grateful for any suggestions!
Cheers!