1

I have a Pandas dataframe where I would like to compare two columns 'text' and 'text_find'.

I would like to create a Boolean flag 'compare' which is set to 1 if the words in 'text_find' are located in 'text', else set 'compare' to 0. e.g.

'text' = 'i hate cars'
'text_dins' = 'cars'

this will make 'compare' = 1

'text' = 'i hate cars'
'text_dins' = 'rabbits'

this will make 'compare' = 0

how would I do all this in a pandas dataframe?

Tom
  • 21
  • 1
  • 4

1 Answers1

5

I think you need apply with axis=1 for process by rows and then compare with in. Last convert Trues and Falses to 1,0 by astype in new column:

df = pd.DataFrame({'text':['i hate cars','i hate cars'], 'text_dins':['cars', 'rabbits']})
print (df)
          text text_dins
0  i hate cars      cars
1  i hate cars   rabbits

df['new'] = df.apply(lambda x: x['text_dins'] in x['text'] , axis=1).astype(int)
print (df)
          text text_dins  new
0  i hate cars      cars    1
1  i hate cars   rabbits    0

Another solution with list comprehension if no NaNs:

df['new']  = [int(x[0] in x[1]) for x in zip(df['text_dins'], df['text'])]
print (df)
          text text_dins  new
0  i hate cars      cars    1
1  i hate cars   rabbits    0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252