2

Here is the input dataframe:

  id  val
0  A    1
1  B    2
2  A   -3
3  C    1
4  D    5
5  B    6
6  C   -2

I would like to group entries by id, and then calculate a running sum of the most recent members of each group seen up to this point. Here is how the desired output would look like, with explanations how it is obtained:

  id  val  out
0  A    1   1
1  B    2   3   (2 + 1)
2  A   -3   -1  (-3 + 2)  
3  C    1   0   (1+ -3 +2)
4  D    5   5   (5 + 1 + -3 + 2_
5  B    6   9   (6 + 5 + 1 + -3)
6  C   -2   6    (-2 + 6 + 5 -3)

Here are some more detailed explanations: 1) The row with id=1 has 3=2+1, because at that time you have 2 groups, As and Bs, each with 1 row, so you have to take that single row from each group.

2) The row with id=2 has -1=-3+2 because at that time, you have 2 groups, As and Bs. The most recent row from the As is 2 A -3 and the single (and thus most recent) row from Bs is 1 B 2, so you add these 2 rows.

3) In the row with id=6, you add up

2  A   -3
4  D    5
5  B    6
6  C   -2

You are taking 1 row from each group, and that is the row that is most recent at that point.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76
  • Why is second value `3` ? when the id of first is `A` and the id of second is `B` . – Anand S Kumar Sep 30 '15 at 15:44
  • Well, at that point, you have 2 groups, A and B, each with one row. The goal is to take the most recent row from each group, and you have 2 groups, each with 1 row, so you add the val values for them and you get 2+1. – Baron Yugovich Sep 30 '15 at 15:47

1 Answers1

0

this should be a relatively quick and easy way to do this using a loop. the way it works is that it adds a new entry to a dictionary whenever it finds one. if the entry already exists it overwrites the corresponding value.

df = pd.DataFrame({'id': ['A','B','A','C','D','B','C'],
                  'val': [1,2,-3,1,5,6,-2]})

num_rows = df.shape[0]

last_vals = {}
for i in range(0, num_rows):
    x = df['id'][i]
    last_vals[x] = df['val'][i]

sum(last_vals.values())
n8sty
  • 1,418
  • 1
  • 14
  • 26