How to merge two dataframes based on a string contains?

Question

I have 2 dataframes that I would like to merge on a particular column based on a string contains. It seems like the following question, but in a different order: How to merge pandas on string contains?

import pandas as pd

df1 = pd.DataFrame({'Amount':[10, 20, 30], 'Description':['this is a text','this is another text','this is an important']})
df2 = pd.DataFrame({'Text':['another','important'], 'Category':['Another Category','Important Category']})

rhs = (df1.Description
          .apply(lambda x: df2[df2['Category']] if df2[df2['Text']] in str(x).lower() else None)
      )

(pd.concat([df1.Amount, rhs], axis=1, ignore_index=True)
 .rename(columns={0: 'Amount', 1: 'Category'}))

I got the following error message:

KeyError: "None of [Index(['another', 'important'], dtype='object')] are in the [columns]"

This occurs because of the lambda expression. With the df2[df2['Text']] part I try to iterate through the dataframe containing the categories, but this doesn't work.

Can you share a sample of the desired output? – iDrwish Aug 25 '19 at 10:05 — iDrwish, Aug 25 '19 at 10:05

score 1 · Accepted Answer · answered Aug 25 '19 at 10:11

Assuming that df2 is a unique table of texts and their categories, I suppose this could work. (assuming the dfs are as you've posted)

join_map = {row['Text']:row['Category'] for ind,row in df2.iterrows()}

df1['Category'] = df1['Description'].apply(lambda x: [val for key,val in join_map.items() if key in x][0] if [val for key,val in join_map.items() if key in x] else None)

How to merge two dataframes based on a string contains?

1 Answers1