1

I have following dataframe

id  pattern1    pattern2    pattern3
 1  a-b-c       a-b--       a-b-c
 2  a-a--       a-b--       a-c--
 3  a-v--       a-m--       a-k--
 4  a-b--       a-n--       a-n-c

I want to filter rows that contains the pattern -- at the end in all the columns. In this case the output would be

 2  a-a--       a-b--       a-c--
 3  a-v--       a-m--       a-k--

So far I can only think of doing something like the following

df[(len(df['pattern1'].str.split('--')[1])==0) & \
   (len(df['pattern2'].str.split('--')[1])==0) & \
   (len(df['pattern3'].str.split('--')[1])==0)]

This doesn't work.Also,I can't write the names of all the columns as tehre are 20 columns. How can I filter rows where all the columns in that row match certain pattern/condition?

cs95
  • 379,657
  • 97
  • 704
  • 746
TLanni
  • 330
  • 1
  • 4
  • 15

1 Answers1

4

Start with setting "id" as the index, if not yet done.

df = df.set_index('id')

One option to check each string is using applymap calling str.endswith:

df[df.applymap(lambda x: x.endswith('--')).all(1)]

   pattern1 pattern2 pattern3
id                           
2     a-a--    a-b--    a-c--
3     a-v--    a-m--    a-k--

Another option is apply calling pd.Series.str.endswith for each column:

df[df.apply(lambda x: x.str.endswith('--')).all(1)]

   pattern1 pattern2 pattern3
id                           
2     a-a--    a-b--    a-c--
3     a-v--    a-m--    a-k--

Lastly, for performance, you can AND masks inside a list comprehension using logical_and.reduce:

# m = np.logical_and.reduce([df[c].str.endswith('--') for c in df.columns])
m = np.logical_and.reduce([
    [x.endswith('--') for x in df[c]] for c in df.columns])
m
# array([False,  True,  True, False])

df[m]
   pattern1 pattern2 pattern3
id                           
2     a-a--    a-b--    a-c--
3     a-v--    a-m--    a-k--

If there are other columns, but you only want to consider those named "pattern*", you can use filter on the DataFrame:

u = df.filter(like='pattern')

Now repeat the options above using u, for example, the first option will be

df[u.applymap(lambda x: x.endswith('--')).all(1)]

...and so on.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Why would I suggest loops here? If you're interested, read my writeup at [For loops with pandas - When should I care?](https://stackoverflow.com/questions/54028199/for-loops-with-pandas-when-should-i-care) – cs95 Jan 16 '19 at 03:21