I have two dataframes, both for companies, that looks like this:
df1 = pd.DataFrame({'Company':["Apple","Facebook","Google"]})
df2= pd.DataFrame({'Company':["Apple Inc. Common Stock","Facebook Common Stock", "Google Alphabet Inc.","AAON Inc."]})
df1
[Out 6]:
Company
Apple
Facebook
Google
df2
[Out 6:]
Company
Apple Inc. Common Stock
Facebook Common Stock
Google Alphabet Inc.
AAON Inc.
I want to compare the two lists such that I am able to identify what companies in df1
are in df2
, but the strings are not exact. Is there a way to test for similarity such that I am able to filter out these companies even though the names aren't exactly the same?
Essentially, I want a code such that my output is df3
, where:
df3
[Out 6:]
Company Company MATCH?
Apple Apple Inc. Common Stock YES
Facebook Facebook Commmon Stock YES
Google Google Alphabet Inc. YES
NA AAON Inc. NO
I also cannot use fuzzywuzzy for the matches. It must be done with pandas or numpy.