I would need to filter columns looking for rows that contain some specific words. My list of datasets include the following
list_datasets=["text1.csv","text2.csv","text3.csv","text4.csv"]
Datasets look like (rows differ; headers - Primary, Secondary, Tertiary - are the same):
Primary Secondary Tertiary
Fiat has sold... Toyota.... Nissan....
Ferrari... Porsche... N/A
I have tried to filter them as follows:
for item in list_datasets:
print(item)
df = pd.read_csv("path"+item)
add_new_column(df)
# Filter
to_search = []
while True:
q = input("Search term: ")
if q!='000':
to_search.append(q)
if q=='000':
break
df['Selected'] = df['Primary, Tertiary'].str.lower().str.contains('|'.join(to_search))
if df[(df['Selected'] == True)] :
df['Tested'] = 1
# do some stuff
My issues are in filtering rows. The code above is giving me the following error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
This corresponds to if df[(df['Selected'] == True)] :
Some context:
What I am trying to do is the following:
For example: in text1, I would like search terms: bmw, jeep, and Toyota; in text2 I would like to search fiat, Chevrolet, Ferrari; and so on. However I do not know how to include it in the code above (I am a bit of confusing about the order). What I would like to do is to use for the part of code where I 'do some stuff' the dataset with filtered rows.
Could you please tell me how to fix it?