0

Is there an easy way to select the column from a dataframe, that their values contain a certain word(not only this specific word but even with some extra words or numbers in the row value)?

I tried one query but it searched for the Unknown word in the column names, which I don't want.

df.filter(like='Unknown')

Then i tried a different approach, to get all the rows that contain that word, create a dataframe and then get the column names out of it but again didn't work.

value_list = ['Unknown']
df_unknown = df[df.str.contains(value_list)]

I also tried the following query

df_uknown = df[df.isin(value_list)]

but it brought back the whole dataframe with Nulls or the Unknown values for all rows, depending on if they had this word as value or not.

I am not sure of what to do next. The answer might be very simple but it eludes me

Thanks

Ast
  • 85
  • 1
  • 6

1 Answers1

0

I believe need create final pattern with all words joined by | for regex OR and compare some column:

value_list = ['Unknown']
pat = '|'.join(r"\b{}\b".format(x) for x in value_list)

df_unknown = df[df['col'].str.contains(pat)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252