How to compare two different sets of strings with similar but different names?

Question

I have two dataframes, both for companies, that looks like this:

df1 = pd.DataFrame({'Company':["Apple","Facebook","Google"]})
df2= pd.DataFrame({'Company':["Apple Inc. Common Stock","Facebook Common Stock", "Google Alphabet Inc.","AAON Inc."]})

df1
[Out 6]:
Company
Apple 
Facebook 
Google

df2
[Out 6:]
Company
Apple Inc. Common Stock
Facebook Common Stock
Google Alphabet Inc.
AAON Inc.

I want to compare the two lists such that I am able to identify what companies in df1 are in df2, but the strings are not exact. Is there a way to test for similarity such that I am able to filter out these companies even though the names aren't exactly the same?

Essentially, I want a code such that my output is df3, where:

df3
[Out 6:]
Company      Company                       MATCH?
Apple        Apple Inc. Common Stock       YES
Facebook     Facebook Commmon Stock        YES
Google       Google Alphabet Inc.          YES
NA           AAON Inc.                     NO

I also cannot use fuzzywuzzy for the matches. It must be done with pandas or numpy.

score 0 · Answer 1 · answered Aug 18 '22 at 16:13

0

Try:

my_dict = {key: any(df2.Company.str.contains(key)) for key in df1.Company}
df1['new_col'] = df1.Company.map(my_dict)

answered Aug 18 '22 at 16:13

Nuri Taş

3,828
2
4
22

I get the following error when I input this: `error: missing ), unterminated subpattern at position 14` – RafaelP Aug 18 '22 at 16:34
Your error doesn't raise from the code above. Try to look for mismatched parentheses in your code – Nuri Taş Aug 18 '22 at 17:06
My code is exactly the same as the one there. `my_dict = {key: any(companies.Name.str.contains(key)) for key in leads.EMPLR_NM} companies['MATCH'] = companies.Name.map(my_dict)` – RafaelP Aug 18 '22 at 18:30

How to compare two different sets of strings with similar but different names?

1 Answers1