0

I have a pandas dataframe that has rows like this

   Same1  Same2  Diff3  Encoded1  Encoded2  Encoded3
0     33     22    150         0         0         0
1     33     22    300         1         0         1

What I want to achieve is to combine all rows where the 'Same1' and 'Same2' variables are the same, by adding up the other variables.

   Same1  Same2  Diff3  Encoded1  Encoded2  Encoded3
0     33     22    450         1         0         1

What would be the cleanest way to achieve this using pandas?

Executable python code: https://trinket.io/python3/1da371fd04

mre
  • 137
  • 11

2 Answers2

2

You can try

out = df.groupby(['Same1', 'Same2']).agg(sum).reset_index()
print(out)

   Same1  Same2  Diff3  Encoded1  Encoded2  Encoded3
0     33     22    450         1         0         1
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
1

You can use a groupby to get the expected result :

df.groupby(['Same1', 'Same2'], as_index=False).sum()

Output :

    Same1   Same2   Diff3   Encoded1    Encoded2    Encoded3
0   33      22      450     1           0           1
tlentali
  • 3,407
  • 2
  • 14
  • 21