0

I have a dataframe with a column level

                     level 

0                     HH
1                     FF
2                     FF
3       C,NN-FRAC,W-PROC
4                    C,D
              ...       
8433            C,W-PROC
8434                 C,D
8435                   D
8436                 C,Q
8437                C,HH

I would like to only conserve row which contains specific string:

searchfor = ['W','W-OFFSH','W-ONSH','W-GB','W-PROC','W-NGTC','W-TRANS','W-UNSTG','W-LNGSTG','W-LNGIE','W-LDC','X','Y','LL','MM','MM – REF','MM – IMP','MM – EXP','NN','NN-FRAC','NN-LDC','OO'] 

which should give me (from the above extract):

                     level 
1       C,NN-FRAC,W-PROC
2       C,W-PROC

I tried to apply these 2 different string filter but non one give me the excepted result.

df = df[df['industrytype'].str.contains(searchfor)]

df = df[df['industrytype'].str.contains(','.join(searchfor))]
Jason Aller
  • 3,541
  • 28
  • 38
  • 38
JEG
  • 154
  • 1
  • 15
  • Please share a [small sample of your DataFrame](https://stackoverflow.com/a/20159305/5901382) for posters to test out possible solutions. – Abirbhav G. Oct 12 '22 at 12:15

1 Answers1

2

It might not be behaving the expected way because of the presence of comma in the columns. You can write a simple function which splits at comma and checks for each different splits. You can use apply method to use that function on the column.

def filter(x):
  x = x.split(',')
  for i in x:
    if i in searchfor:
      return True
  return False

df = df[df.industrytype.apply(filter)]