1

I have a dataframe and want to find all rows when one of the columns contains a certain string:

tmp = data_frame[data_frame["DESC"].str.contains(tag, na=False)]

However, assume that tag is a list, and I want the column to contain any string in the list, for example:

tmp = data_frame[(data_frame["DESC"].str.contains(tag[0], na=False)) | (data_frame["DESC"].str.contains(tag[1], na=False))]

Now, assume that I have a list of lists, and tag is an element in it, and I loop through this list of lists:

for tag in tag_list:
    tmp = data_frame[(data_frame["DESC"].str.contains(tag[0], na=False)) | (data_frame["DESC"].str.contains(tag[1], na=False))]
---do something with tmp

Further, now assume that tag_list is a list of lists, but each element may have different length, so sometimes tag has 1 element, sometimes 2, sometimes 4, etc. How can I define tmp in a way that it is independent of a fixed length for tag?

Ex:

tmp = pandas.DataFrame(columns=["DESC"])
tmp.loc[0] = ["Hello"]
tmp.loc[1] = ["Hello"]
tmp.loc[2] = ["Hi"]
tmp.loc[3] = ["Good Morning"]

tag = ["Hi", "Hello"]

tmp2 = tmp[(tmp["DESC"].str.contains(tag[0], na=False)) | (tmp["DESC"].str.contains(tag[1], na=False))]
user
  • 2,015
  • 6
  • 22
  • 39
  • Possible duplicate of [pandas: test if string contains one of the substrings in a list](https://stackoverflow.com/questions/26577516/pandas-test-if-string-contains-one-of-the-substrings-in-a-list) – pault Feb 23 '18 at 20:37
  • The answer in the question I linked as a possible dupe would suggest trying something like: `data_frame[data_frame["DESC"].str.contains('|'.join(tag))]` where `|` is the regex OR operation. – pault Feb 23 '18 at 20:40
  • Can't really help you figure out why without an [mcve]. See this post on [how to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – pault Feb 23 '18 at 20:53
  • Here, added an example, the linked answer is not helping me, but maybe because I am slow – user Feb 23 '18 at 21:10
  • From your example, `tmp[(tmp["DESC"].str.contains("|".join(tag), na=False))]` works for me. Are you seeing something different? Do any of your tags have special characters like `$,*,.,+` etc? – pault Feb 23 '18 at 21:16
  • My bad, you are right! – user Feb 23 '18 at 21:19

1 Answers1

1

This should work. Can you try it and let me know that I will make corrections if necessary:

def select_tags(df_line, taglistlist):
    for taglist in taglistlist:
        for tag in taglist:
            if df_line['DESC'].str.contains(tag, na=False)
                # INSERT LOGIC HERE
                pass

df.apply(select_tags, args=(taglistlist,), axis=1)
joaoavf
  • 1,343
  • 1
  • 12
  • 25