-1

I would need to filter columns looking for rows that contain some specific words. My list of datasets include the following

list_datasets=["text1.csv","text2.csv","text3.csv","text4.csv"]

Datasets look like (rows differ; headers - Primary, Secondary, Tertiary - are the same):

    Primary         Secondary  Tertiary 
Fiat has sold...   Toyota....   Nissan....
Ferrari...         Porsche...   N/A

I have tried to filter them as follows:

for item in list_datasets:
            print(item)
            df = pd.read_csv("path"+item)
            add_new_column(df)
     # Filter
        to_search = [] 

        while True:
            q = input("Search term: ")

            if q!='000':
                to_search.append(q)

            if q=='000':
                break
        df['Selected'] = df['Primary, Tertiary'].str.lower().str.contains('|'.join(to_search))
    if df[(df['Selected'] == True)] :
        df['Tested'] = 1
        # do some stuff

My issues are in filtering rows. The code above is giving me the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This corresponds to if df[(df['Selected'] == True)] :

Some context:

What I am trying to do is the following:

For example: in text1, I would like search terms: bmw, jeep, and Toyota; in text2 I would like to search fiat, Chevrolet, Ferrari; and so on. However I do not know how to include it in the code above (I am a bit of confusing about the order). What I would like to do is to use for the part of code where I 'do some stuff' the dataset with filtered rows.

Could you please tell me how to fix it?

still_learning
  • 776
  • 9
  • 32
  • This `if df[(df['Selected'] == True)]`: You can turn a dataframe into `True` or `False` statement. What are you trying to do? – Quang Hoang Apr 26 '20 at 23:29
  • I would like to filter each dataset based on some terms that I will add manually (using input in the while True condition). I have updated the question to better explain it. – still_learning Apr 26 '20 at 23:34
  • Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) – AMC Apr 27 '20 at 00:40
  • @AMC: I have tried with the following: `df1=df[df['Primary'].str.lower().str.contains('|'.join(to_search))]` instead of creating a new column (`Selected`) and set the `if` statement condition. It works. However I do not know how to filter also within `Terziary` – still_learning Apr 27 '20 at 01:05

1 Answers1

1

Dont complicate it. Iterate through csv list and plot each csv and search as per the substring you have defined.

For instance text1 df; Data

df=pd.DataFrame({'Primary':['Fiat','Toyota','Mercedes','bmw', 'jeep', 'Porsche','Nissan','Volvo'],'Secondary':['Fiat','Toyota','bmw', 'jeep','Mercedes', 'Porsche','Nissan','Volvo'],'Tertairy':['Fiat','Toyota','Mercedes', 'Porsche','Nissan','bmw', 'jeep','Volvo']})
df

text1search=['bmw', 'jeep',]# Your search phrase
s="|".join(text1search)#Join the search phrase
m = df['Primary'] + df['Secondary']+ df['Tertairy']#Join all columns values
#df['check']= [y for y in ['|'.join([x for x in r])for r in df.values.tolist()]]
df[m.str.lower().str.contains(s, case=False)]#select the rows
#df = df[df.l.str.contains(s, case=False)]
df

Output

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32