I have 2 dataframes that I would like to merge on a particular column based on a string contains. It seems like the following question, but in a different order: How to merge pandas on string contains?
import pandas as pd
df1 = pd.DataFrame({'Amount':[10, 20, 30], 'Description':['this is a text','this is another text','this is an important']})
df2 = pd.DataFrame({'Text':['another','important'], 'Category':['Another Category','Important Category']})
rhs = (df1.Description
.apply(lambda x: df2[df2['Category']] if df2[df2['Text']] in str(x).lower() else None)
)
(pd.concat([df1.Amount, rhs], axis=1, ignore_index=True)
.rename(columns={0: 'Amount', 1: 'Category'}))
I got the following error message:
KeyError: "None of [Index(['another', 'important'], dtype='object')] are in the [columns]"
This occurs because of the lambda expression. With the df2[df2['Text']] part I try to iterate through the dataframe containing the categories, but this doesn't work.