I have two dataframes of different size but common columns, with ID numbers as key column.
I want to compare the column "names" in both dataframes.
But sometimes the "names" does not match because there are upper cases or there are additional uncommon words.
Is there a way to "match" the column with something like a "tolerance", 70% matching words?
Here is an example:
import pandas as pd
data1 = {'key_column': ['1457', '2356', '2268', '1497','2298'],
'names': ['Birds are flying', 'Flowers', 'pink bird', 'pink bird', 'Sun Beach'],
'random_column_in_df1':['1', '2', '2', '2', '2'],
}
data2 = {'key_column': ['2268', '2356', '2298', '1497'],
'names': ['bird', 'flowers here', 'Sun','some text'],
'random_column_in_df2':['1', '3', '2', '3'],
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1
df2
expected output: