2

I have a list of values that I would like to use to select rows in a dataframe. The trick is I would like to select any row where the list value is in the row. Example:

index    color    shape
 1       blue     star
 2       red      square
 3       yellow   circle

My list would be

list_vals = ['sq', 'blu']

I would like to select the rows

index    color   shape
1        blue    star
2        red     square
magladde
  • 614
  • 5
  • 23

4 Answers4

3

Use DataFrame.stack to convert to a Series, then use Series.str.contains to find the strings your interested in - we'll use '|'.join to create a regex 'OR' patter combining all items from list_items.

For reference, this regex pattern looks like 'sq|blu' in this case.

Next, Series.unstack to get back to original shape and use DataFrame.any over axis 1 to create the boolean index we'll use to return the desired rows.

df[df.stack().str.contains('|'.join(list_vals)).unstack().any(1)]

[out]

   ndex color   shape
0     1  blue    star
1     2   red  square
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
2

Here is one approach

df_filtered = (
    df[(df['color'].str.contains(list_vals[0])) |
        (df['shape'].str.contains(list_vals[1]))
        ]
                )

print(df_filtered)
   index color   shape
0      1  blue    star
1      2   red  square

EDIT

Another approach is based on this SO post (which contains the full explanation of this method)

  • the only changes I made were (1) to join your search list into a single search string and (2) to return the DataFrame (row) index of the search (filtered) results (this is then used to slice the original DataFrame)
def find_subtext(df, txt):
    contains = df.stack().str.contains(txt).unstack()
    return contains[contains.any(1)].index
df_filtered = find_subtext(df, '|'.join(list_vals))

print(df.iloc[df_filtered, :])
   index color   shape
0      1  blue    star
1      2   red  square
edesz
  • 11,756
  • 22
  • 75
  • 123
2
df[df['shape'].apply(lambda x: any(s in x[:len(s)] for s in list_vals))]

Output

  color   shape
1   red  square
iamklaus
  • 3,720
  • 2
  • 12
  • 21
2

Or join the list with a pipe and check with str.contains() over the df:

df[df.apply(lambda x: x.str.contains('|'.join(list_vals))).any(axis=1)]

       color   shape
index              
1      blue    star
2       red  square
anky
  • 74,114
  • 11
  • 41
  • 70