Create "or" conditions based on elements in a list that may have variable length

Question

I have a dataframe and want to find all rows when one of the columns contains a certain string:

tmp = data_frame[data_frame["DESC"].str.contains(tag, na=False)]

However, assume that tag is a list, and I want the column to contain any string in the list, for example:

tmp = data_frame[(data_frame["DESC"].str.contains(tag[0], na=False)) | (data_frame["DESC"].str.contains(tag[1], na=False))]

Now, assume that I have a list of lists, and tag is an element in it, and I loop through this list of lists:

for tag in tag_list:
    tmp = data_frame[(data_frame["DESC"].str.contains(tag[0], na=False)) | (data_frame["DESC"].str.contains(tag[1], na=False))]
---do something with tmp

Further, now assume that tag_list is a list of lists, but each element may have different length, so sometimes tag has 1 element, sometimes 2, sometimes 4, etc. How can I define tmp in a way that it is independent of a fixed length for tag?

Ex:

tmp = pandas.DataFrame(columns=["DESC"])
tmp.loc[0] = ["Hello"]
tmp.loc[1] = ["Hello"]
tmp.loc[2] = ["Hi"]
tmp.loc[3] = ["Good Morning"]

tag = ["Hi", "Hello"]

tmp2 = tmp[(tmp["DESC"].str.contains(tag[0], na=False)) | (tmp["DESC"].str.contains(tag[1], na=False))]

Possible duplicate of [pandas: test if string contains one of the substrings in a list](https://stackoverflow.com/questions/26577516/pandas-test-if-string-contains-one-of-the-substrings-in-a-list) — pault, Feb 23 '18 at 20:37
The answer in the question I linked as a possible dupe would suggest trying something like: `data_frame[data_frame["DESC"].str.contains('|'.join(tag))]` where `|` is the regex OR operation. — pault, Feb 23 '18 at 20:40
Can't really help you figure out why without an [mcve]. See this post on [how to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — pault, Feb 23 '18 at 20:53
Here, added an example, the linked answer is not helping me, but maybe because I am slow — user, Feb 23 '18 at 21:10
From your example, `tmp[(tmp["DESC"].str.contains("|".join(tag), na=False))]` works for me. Are you seeing something different? Do any of your tags have special characters like `$,*,.,+` etc? — pault, Feb 23 '18 at 21:16

score 1 · Answer 1 · answered Feb 23 '18 at 21:09

1

This should work. Can you try it and let me know that I will make corrections if necessary:

def select_tags(df_line, taglistlist):
    for taglist in taglistlist:
        for tag in taglist:
            if df_line['DESC'].str.contains(tag, na=False)
                # INSERT LOGIC HERE
                pass

df.apply(select_tags, args=(taglistlist,), axis=1)

answered Feb 23 '18 at 21:09

joaoavf

1,343
1
12
25

1

Thank you for your contribution, the answer in the comment actually works – user Feb 23 '18 at 21:37

Create "or" conditions based on elements in a list that may have variable length

1 Answers1