Pandas: Grouping rows by list in CSV file?

Question

In an effort to make our budgeting life a bit easier and help myself learn; I am creating a small program in python that takes data from our exported bank csv.

I will give you an example of what I want to do with this data. Say I want to group all of my fast food expenses together. There are many different names with different totals in the description column but I want to see it all tabulated as one "Fast Food " expense.

For instance the Csv is setup like this:

Date             Description           Debit    Credit
1/20/20      POS PIN BLAH BLAH ###     1.75      NaN

I figured out how to group them with an or statement:

contains = df.loc[df['Description'].str.contains('food court|whataburger', flags = re.I, regex = True)]

I ultimately would like to have it read off of a list? I would like to group all my expenses into categories and check those category variable names so that it would only output from that list.

I tried something like:

fast_food = ['Macdonald', 'Whataburger', 'pizza hut']

That obviously didn't work.

If there is a better way of doing this I am wide open to suggestions.

Also I have looked through quite a few posts here on stack and have yet to find the answer (although I am sure I overlooked it)

Any help would be greatly appreciated. I am still learning.

Thanks

Are you saying you want to group the list of `fast_food` and search it by `str.contains`? — Henry Yik, Feb 25 '20 at 15:41
you can create a map from all the fast food places but there is no automatic way to do this (bar using NLP) or having a list of every place you eat or could eat in a list to filter it by beore a groupby — Umar.H, Feb 25 '20 at 15:43
Right. I also want it to look at that row's debit category and give me the sum of all in that list. I can't seem to get str.contains to work alongside a list to do that though. — Ricky Long, Feb 25 '20 at 15:43
`...str.contains('|'.join(fast_food), flags = re.I, regex = True)`? — Henry Yik, Feb 25 '20 at 15:45
@HenryYik that worked, is there a way to make them show me their individual sums once grouped? When I just add .sum() to the end it works but it doesn't show me all the data. It is a bit jumbled up. For instance (All macdonalds = 30.00) — Ricky Long, Feb 25 '20 at 15:49

Henry Yik · Accepted Answer · 2020-02-25T17:06:27.553

1

You can assign a new column using str.extract and then groupby:

df = pd.DataFrame({"description":['Macdonald something', 'Whataburger something', 'pizza hut something',
                                  'Whataburger something','Macdonald something','Macdonald otherthing',],
                   "debit":[1.75,2.0,3.5,4.5,1.5,2.0]})

fast_food = ['Macdonald', 'Whataburger', 'pizza hut']

df["found"] = df["description"].str.extract(f'({"|".join(fast_food)})',flags=re.I)

print (df.groupby("found").sum())

#
             debit
found             
Macdonald     5.25
Whataburger   6.50
pizza hut     3.50

edited Feb 25 '20 at 17:06

answered Feb 25 '20 at 16:00

Henry Yik

22,275
4
18
40

I guess I was not clear enough. I apologize. I want the sum of their corresponding categories "Debit" or "Credit" that is on the same row, not the number of times Macdonalds show up. – Ricky Long Feb 25 '20 at 17:00
Thank you @HenryYik – Ricky Long Feb 25 '20 at 17:39

score 0 · Answer 2 · answered Feb 25 '20 at 15:46

Use dynamic pattern building:

fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
pattern = r"\b(?:{})\b".format("|".join(map(re.escape, fast_food)))    
contains = df.loc[df['Description'].str.contains(pattern, flags = re.I, regex = True)]

The \b word boundaries find whole words, not partial words.

The re.escape will protect special characters and they will be parsed as literal characters.

If \b does not work for you, check other approaches at Match a whole word in a string using dynamic regex

Pandas: Grouping rows by list in CSV file?

2 Answers2