I have a pandas data frame, in which basically only two columns are important. The column 'Name' and the other one 'Cost'.
I have different categories for my costs. For each I have list of keywords. Based on these keywords I find its related rows in the dataframe:
a = df[df['Name'].str.contains('|'.join(keywords),case=False)]
and then I calculate the sum of Cost values in those rows to get that category cost:
sum_ = 0
for index, row in a.iterrows():
cost= float(row['Cost'])
sum_ += cost
The problem is with this approach, I never know if a certain row has been considered multiple times or if at the end a row is missed and wasn't allocated to any category.
My question is first how to get indexes of the filtered/chosen rows when using str.contain and then how to check if the rows has been previously used in another category.
Thank you so much.