1

I understand merging two dataframes can be done using the below code

left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')

Merging process was explained in detail in this link
However, I have dataframes which have text as follows

Dataframe1

 61     ability produce required documents based trans...
 237    ability setup track definable error sources
 440    ability log discrpeancies received vs shipped ...
1786    training education cover supply chain principl...
1931    system show estimated cost make reslotting mov...        ​

Dataframe2

    KeyWords
0   ability
1   require
2   document
3   base
4   transportation

Now I want to join the dataframes and I want the rows to be duplicated as Dataframe2 has words which can come in multiple rows of Dataframe1.

When I use a simple merge, I get a default NULL values

input_file = Dataframe2.merge(Dataframe1, left_on='KeyWords', right_on='Questions', how = 'left')

    KeyWords       Questions
0   ability         NaN
1   require         NaN
2   document        NaN
3   base            NaN
4   transportation  NaN

How do I join so that I get the values out ? Thank you

My expected output should be like this.

KeyWords   Questions 
ability    ability produce required documents based trans...
ability    ability setup track definable error sources
ability    ability log discrpeancies received vs shipped ...
Vinay billa
  • 309
  • 2
  • 4
  • 17

2 Answers2

1

One way using pandas.Series.str.contains:

df2['Questions'] = df2['KeyWords'].apply(lambda x: df1['Qs'][df1['Qs'].str.contains(x)].index.tolist())
print(df2)

Output:

         KeyWords       Questions
0         ability  [61, 237, 440]
1         require            [61]
2        document            [61]
3            base            [61]
4  transportation              []
Chris
  • 29,127
  • 3
  • 28
  • 51
1

We can use Series.str.split + DataFrame.explode. Then use DataFrame.merge:

df1['list_words']=df1['Questions'].str.split(' ')
new_df1=df1.explode('list_words')
df_merge=df2.merge(new_df1,left_on='KeyWords',right_on='list_words',how='inner').drop('list_words',axis=1)
print(df_merge)

Output

   KeyWords                                            Questions 
0   ability    ability produce required documents based trans...
1   ability    ability setup track definable error sources
2   ability    ability log discrpeancies received vs shipped ...
ansev
  • 30,322
  • 5
  • 17
  • 31
  • whenever i write final_df5['list_words']=final_df5['Questions'].str.split(' '), I get the following error : KeyError: 'Questions' – Vinay billa Nov 11 '19 at 11:44
  • 1
    you have to make sure that your column is called Questions and that its content is of type string. Could you show the ouput `final_df5.columns` and `final_df5['Questions']`? – ansev Nov 11 '19 at 11:46