Find substring in list in a pandas.Series in Python

Question

I have a pandas Dataframe, where one column contains lists. I want to search every list (=every row) and check if one or more elements contain specific substrings.

Data:

list_Series = pd.Series([["handful of tomatos", "2 peppers", " tsp salt"],
                        ["1 kg of meat", "fresh basil"]])

Search words:

search_for = ["pepper", "salt"]

Desired output for 'list_Series':

True
False

Now I want to apply a (maybe vectorized?) function that checks if a series element contains all the search substrings. If the Series only contains strings and no lists, I would do: pd.Series.str.contains("salt"). When looking at a single list I would perform:

def filterlist(liste, searchwords):
    occurs = 0
    for word in searchwords:
        for string in liste:
            if word.lower() in string.lower():
                occurs += 1
                break 
        if occurs == len(searchwords):                   
            return True

But this is very clunky and long. And I guess not very efficient when applying to a whole pd.Series. And I don't know how to apply it to a Series.

Thanks for the help! Also looking for feedback, this is my first post on stackoverflow! Also would it be better to convert this series into a dataframe?

You can't vectorize this. In fact, plenty of the `.str` methods run slower than list comprehensions — roganjosh, Oct 11 '20 at 13:34

zabop · Answer 1 · 2020-10-11T13:41:46.527

0

You can use nested list comprehensions:

result = [listelement for searchtarget in search_for for each_list_in_series in list_Series for listelement in each_list_in_series if searchtarget in listelement]

result will be:

['2 peppers', ' tsp salt']

This is equivalent to, without list comprehensions:

result=[]
for searchtarget in search_for:
    for each_list_in_series in list_Series:
        for listelement in each_list_in_series:
            if searchtarget in listelement:
                result.append(listelement)

A nice visual aide for nested list comprehensions, from Rahul's answer to this question:

edited Oct 11 '20 at 13:41

answered Oct 11 '20 at 13:34

zabop

6,750
3
39
84

That comprehension just crashes my brain. The overuse of `each` is borderline unintelligible – roganjosh Oct 11 '20 at 13:35
I am looking for your better alternative :) – zabop Oct 11 '20 at 13:36
Better names at the very least? It's horrible to look at for now; I can't even process it properly – roganjosh Oct 11 '20 at 13:36
Better names done, now working on an explanation, few mins... – zabop Oct 11 '20 at 13:40
Thanks for the answer, and posting the equivalent! That is a lot easier to understand! – CloudyBeginner Oct 11 '20 at 13:41
Edited the answer; I hope both of you like it better now :) – zabop Oct 11 '20 at 13:45

Find substring in list in a pandas.Series in Python

1 Answers1