1

For a given numpy array:

[[1, 1, 'IGNORE_THIS_COL', 100],
 [1, 1, 'IGNORE_THIS_COL', 101],
 [1, 2, 'IGNORE_THIS_COL', 100]]

Is it possible to sum the rows (and columns conditionally)? Say column 0 is group and column 1 is user, then I would like to add the fourth column accordingly. The final 'summed' array should look like this.

[[1, 1, 'IGNORE_THIS_COL', 201],
 [1, 2, 'IGNORE_THIS_COL', 100]]

I have already checked multiple answers, including Numpy: conditional sum.

cs95
  • 379,657
  • 97
  • 704
  • 746
DaveIdito
  • 1,546
  • 14
  • 31

1 Answers1

1

You're looking for a groupby on a subset of columns. This is a challenge to implement with numpy, but is straightforward with a pandas groupby:

import pandas as pd

df = pd.DataFrame(array)
out = df.groupby([0, 1], as_index=False).agg({2:'first', 3:'sum'}).values.tolist()

print(out)
[[1, 1, 'IGNORE_THIS_COL', 201], [1, 2, 'IGNORE_THIS_COL', 100]]
cs95
  • 379,657
  • 97
  • 704
  • 746