How do I stop for loop that sums value in a column returning multiple identical values?

Question

Suppose I have the following data frame:

import pandas as pd

df = pd.DataFrame()
df['ID'] = 1, 1, 1, 2, 2, 3, 3
df['a'] = 3, 5, 6, 3, 8, 1, 2

I want to create a for loop that loops over ID and returns the sum of 'a' for that ID. So far I have this:

for i in df['ID']:
    print(i, df.loc[df['ID'] == i, 'a'].sum())

However this returns multiples of the same value like so:

How do I edit my pool so that once it has returned the value for 'id' == 1 it moves on to the next id value rather than just down to the next row?

I'm looking to get the following:

1 14
2 11
3 3

Thanks in advance!

score 1 · Accepted Answer · answered Nov 27 '18 at 15:45

1

This is much better suited to groupby rather than looping (as are many pandas dataframe problems):

>>> df.groupby('ID')['a'].sum()
ID
1    14
2    11
3     3
Name: a, dtype: int64

However, just to explain where your loop went wrong, you can just loop through the unique values of df['ID'], rather than all rows:

for i in df['ID'].unique():
    print(i, df.loc[df['ID'] == i, 'a'].sum())

1 14
2 11
3 3

answered Nov 27 '18 at 15:45

sacuL

Wonderful, both of these work nicely! Could you explain why groupby is better than a loop in this instance other than its a single line – tom91 Nov 27 '18 at 15:48
1

for starters, it's faster (for large dataframes), less error prone, and leads to cleaner code. [Here is an article about it :)](https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6) – sacuL Nov 27 '18 at 15:49
1

@ sacul Thank you for linking that I'll check it out, and thank you for answering the loop question as well! – tom91 Nov 27 '18 at 15:59

1 Answers1