DataFrame is ambiguous when filtering rows

Question

I would need to filter columns looking for rows that contain some specific words. My list of datasets include the following

list_datasets=["text1.csv","text2.csv","text3.csv","text4.csv"]

Datasets look like (rows differ; headers - Primary, Secondary, Tertiary - are the same):

    Primary         Secondary  Tertiary 
Fiat has sold...   Toyota....   Nissan....
Ferrari...         Porsche...   N/A

I have tried to filter them as follows:

for item in list_datasets:
            print(item)
            df = pd.read_csv("path"+item)
            add_new_column(df)
     # Filter
        to_search = [] 

        while True:
            q = input("Search term: ")

            if q!='000':
                to_search.append(q)

            if q=='000':
                break
        df['Selected'] = df['Primary, Tertiary'].str.lower().str.contains('|'.join(to_search))
    if df[(df['Selected'] == True)] :
        df['Tested'] = 1
        # do some stuff

My issues are in filtering rows. The code above is giving me the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This corresponds to if df[(df['Selected'] == True)] :

Some context:

What I am trying to do is the following:

For example: in text1, I would like search terms: bmw, jeep, and Toyota; in text2 I would like to search fiat, Chevrolet, Ferrari; and so on. However I do not know how to include it in the code above (I am a bit of confusing about the order). What I would like to do is to use for the part of code where I 'do some stuff' the dataset with filtered rows.

Could you please tell me how to fix it?

This `if df[(df['Selected'] == True)]`: You can turn a dataframe into `True` or `False` statement. What are you trying to do? — Quang Hoang, Apr 26 '20 at 23:29
I would like to filter each dataset based on some terms that I will add manually (using input in the while True condition). I have updated the question to better explain it. — still_learning, Apr 26 '20 at 23:34
Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) — AMC, Apr 27 '20 at 00:40
@AMC: I have tried with the following: `df1=df[df['Primary'].str.lower().str.contains('|'.join(to_search))]` instead of creating a new column (`Selected`) and set the `if` statement condition. It works. However I do not know how to filter also within `Terziary` — still_learning, Apr 27 '20 at 01:05

wwnde · Accepted Answer · 2020-04-27T02:10:15.907

Dont complicate it. Iterate through csv list and plot each csv and search as per the substring you have defined.

For instance text1 df; Data

df=pd.DataFrame({'Primary':['Fiat','Toyota','Mercedes','bmw', 'jeep', 'Porsche','Nissan','Volvo'],'Secondary':['Fiat','Toyota','bmw', 'jeep','Mercedes', 'Porsche','Nissan','Volvo'],'Tertairy':['Fiat','Toyota','Mercedes', 'Porsche','Nissan','bmw', 'jeep','Volvo']})
df

text1search=['bmw', 'jeep',]# Your search phrase
s="|".join(text1search)#Join the search phrase
m = df['Primary'] + df['Secondary']+ df['Tertairy']#Join all columns values
#df['check']= [y for y in ['|'.join([x for x in r])for r in df.values.tolist()]]
df[m.str.lower().str.contains(s, case=False)]#select the rows
#df = df[df.l.str.contains(s, case=False)]
df

Output

DataFrame is ambiguous when filtering rows

1 Answers1