1

In the following DataFrame I need to search for all strings in 'a'.

df = pd.DataFrame({'id' : [1,2,3,4],
                'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3"]})

Need to check whether both 'p1' and 'p2' available.

a = ['p1','p2']

Something like following

if all(x in df.path for x in a):
    print df
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121

1 Answers1

1

How about this?

import pandas as pd

df = pd.DataFrame({'id': [1,2,3,4],
       'path': ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3"]})

a = [ 'p1', 'p2']

# see: http://stackoverflow.com/a/470602/1407427
reg_exp = ''.join(['(?=.*%s)' % (i) for i in a])

# alternatively: print df.path.str.match(reg_exp, as_indexer=True)
print df.path.str.contains(reg_exp)

And the result:

0     True
1     True
2    False
3     True
Name: path, dtype: bool
Wojciech Walczak
  • 3,419
  • 2
  • 23
  • 24
  • Thanks very much! Did you get the result with str.match? I didn’t. any way this worked. df.path.str.contains(reg_exp) – Nilani Algiriyage Mar 13 '14 at 08:39
  • It should work with both `str.match` and `str.contains`. Does it throw any errors when you're using `str.match`? – Wojciech Walczak Mar 13 '14 at 08:44
  • Thanks again, you saved lot of my time, No errors,it just show empty strings. – Nilani Algiriyage Mar 13 '14 at 08:50
  • I guess that's a matter of the Pandas version you're using. I guess that this would work: `df.path.str.match(reg_exp, as_indexer=True)`. See: http://pandas.pydata.org/pandas-docs/stable/basics.html#testing-for-strings-that-match-or-contain-a-pattern. – Wojciech Walczak Mar 13 '14 at 08:51