0

I have an excel file, with expenses that I need to group by the same concept. For simplicity, the dataframe would be something like this:

Date Concept Amount
07-12-2021 A 01 -100
01-12-2021 A 02 -150
30-11-2021 B 01 -50
29-11-2021 C 01 -25
23-11-2021 D 01 -70
22-11-2021 C 02 -80
19-11-2021 B 02 -65
18-11-2021 B 03 -25
17-11-2021 A 03 -30
16-11-2021 D 02 -120

The concept 'A 0X', refer to the same general concept 'A' but it has different identifiers. Bear in mind that concept 'A 0X' is for the sake of simplicity, but the concept will be a more complex expression

I am trying using this code to group by the same concept:

import pandas as pd
df = pd.read_excel("Expenses.xlsx")
df.groupby('Concept')['Amount'].apply(list)

The problem is that with this code I can not group all concepts 'A0X', 'B0X', 'C0X' as the same general concept 'A', 'B' or 'C'.

The final result I need would be:

Total expenses

Concept Amount
A 280
B 140
C 105
D 190

What I need is to know how to implement the groupby method with regex expressions.

nekovolta
  • 496
  • 2
  • 14

1 Answers1

1

If you want to group by the first letter of the Concept column, you can use the pandas string functions in a groupby.

df_total = df.groupby(df['Concept'].str.split(' ').str[0]).sum().reset_index()

This returns:

>>> df_total
    Concept Amount
0   A       -280
1   B       -140
2   C       -105
3   D       -190

If you want the absolute value of the Amount column:

df_total['Amount'] = df_total['Amount'].abs()
sunnytown
  • 1,844
  • 1
  • 6
  • 13
  • It definitely works as your solution. However, the concept 'A 0X' is for the sake of simplicity. In the real case, the concept is much more complex, and I think I should attempt that using regex – nekovolta Dec 08 '21 at 12:27