1

Hi I have a list of keywords.

keyword_list=['one','two']

DF,

    Name      Description
    Sri       Sri is one of the good singer in this two
    Ram       Ram is one of the good cricket player

I want to find the rows which are having all the values from my keyword_list.

my desired output is,

output_Df,
    Name    Description
    Sri     Sri is one of the good singer in this two

I tried, mask=DF['Description'].str.contains() method but I can do this only for a single word pls help.
Pyd
  • 6,017
  • 18
  • 52
  • 109

1 Answers1

2

Use np.logical_and + reduce of all masks created by list comprehension:

keyword_list=['one','two']

m = np.logical_and.reduce([df['Description'].str.contains(x) for x in keyword_list])
df1 = df[m]
print (df1)

  Name                                Description
0  Sri  Sri is one of the good singer in this two

Alternatives for mask:

m = np.all([df['Description'].str.contains(x) for x in keyword_list], axis=0)

#if no NaNs
m = [set(x.split()) >= set(keyword_list) for x in df['Description']]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    your solution for the above question is working fine, could you please suggest me best tutorials for numpy and pandas as I want to be good with it. – Pyd Oct 03 '17 at 06:47
  • Hard question, in my opinion the best are pandas documentation and tutorials, especially I like [modern pandas](http://pandas.pydata.org/pandas-docs/stable/tutorials.html#modern-pandas). – jezrael Oct 03 '17 at 06:49
  • 1
    I did. I was told to wait for sometime to accept it. – Pyd Oct 03 '17 at 12:19