Group dataframe by similar rows

Question

I have an excel file, with expenses that I need to group by the same concept. For simplicity, the dataframe would be something like this:

Date	Concept	Amount
07-12-2021	A 01	-100
01-12-2021	A 02	-150
30-11-2021	B 01	-50
29-11-2021	C 01	-25
23-11-2021	D 01	-70
22-11-2021	C 02	-80
19-11-2021	B 02	-65
18-11-2021	B 03	-25
17-11-2021	A 03	-30
16-11-2021	D 02	-120

The concept 'A 0X', refer to the same general concept 'A' but it has different identifiers. Bear in mind that concept 'A 0X' is for the sake of simplicity, but the concept will be a more complex expression

I am trying using this code to group by the same concept:

import pandas as pd
df = pd.read_excel("Expenses.xlsx")
df.groupby('Concept')['Amount'].apply(list)

The problem is that with this code I can not group all concepts 'A0X', 'B0X', 'C0X' as the same general concept 'A', 'B' or 'C'.

The final result I need would be:

Total expenses

Concept	Amount
A	280
B	140
C	105
D	190

What I need is to know how to implement the groupby method with regex expressions.

Have you tried using regex (import re) - example of how to use regex with pandas — NeoTheNerd, Dec 08 '21 at 11:22

score 1 · Answer 1 · answered Dec 08 '21 at 12:20

1

If you want to group by the first letter of the Concept column, you can use the pandas string functions in a groupby.

df_total = df.groupby(df['Concept'].str.split(' ').str[0]).sum().reset_index()

This returns:

>>> df_total
    Concept Amount
0   A       -280
1   B       -140
2   C       -105
3   D       -190

If you want the absolute value of the Amount column:

df_total['Amount'] = df_total['Amount'].abs()

answered Dec 08 '21 at 12:20

sunnytown

1,844
1
6
13

It definitely works as your solution. However, the concept 'A 0X' is for the sake of simplicity. In the real case, the concept is much more complex, and I think I should attempt that using regex – nekovolta Dec 08 '21 at 12:27

Group dataframe by similar rows

1 Answers1