What I'm trying to do is replicate this SQL code to Python:
select column_1, column_2, column_3,
sum(column_4) as sum_column_4, sum(column_5) as sum_column_5
from df
group by 1,2,3;
In other words, I need to make this data frame:
column_1 column_2 colunn_3 column_4 column_5
AA BB CC 5 3
AA BB CC 5 0
AA BB CC 7 3
AA DD EE 5 2
AA DD EE 7 1
DD EE FF 2 8
DD EE FF 1 0
Look like this:
column_1 column_2 colunn_3 sum_column_4 sum_column_5
AA BB CC 17 6
AA DD EE 12 3
DD EE FF 3 8
Also, I'm trying to make this as simple as possible, because I actually have a lot of columns. And I need to have a new Pandas data frame as output. So this is what I've tried:
df.groupby(list(df.columns)[0:3]).sum()
It's almost there, the problem is that the output gets weird, something like:
column_1 column_2 colunn_3 sum_column_4 sum_column_5
AA BB CC 17 6
DD EE 12 3
DD EE FF 3 8
I'm trying different things that I've seen in other posts, like Pandas DataFrame Groupby two columns and get counts and Python Pandas group by multiple columns, mean of another - no group by object, but that didn't work. So if anyone could help me.