How to iterate over Pandas Series generated from groupby().size()

Question

How do you iterate over a Pandas Series generated from a .groupby('...').size() command and get both the group name and count.

As an example if I have:

how can I loop over them so that in each iteration I would have -1 & 7, 0 & 85, 1 & 14 and 2 & 5 in variables?

I tried the enumerate option but it doesn't quite work. Example:

for i, row in enumerate(df.groupby(['foo']).size()):
    print(i, row)

it doesn't return -1, 0, 1, and 2 for i but rather 0, 1, 2, 3.

enumerate just calculates the number of item in any sequence, it knows nothing about internal index of Series, that is why it is just 0, 1, 2, 3 and will be the same for any iterable — Leonid Mednikov, Aug 09 '17 at 13:41
Most numeric operations with pandas can be vectorized - this means they are much faster than conventional iteration. OTOH, some operations (such as string and regex) are inherently hard to vectorize. This this case, it is important to understand _how_ to loop over your data. More more information on when and how looping over your data is to be done, please read [For loops with Pandas - When should I care?](https://stackoverflow.com/questions/54028199/for-loops-with-pandas-when-should-i-care/54028200#54028200). — cs95, Jan 04 '19 at 10:17

score 127 · Accepted Answer · edited Jun 20 '20 at 09:12

Update:

Given a pandas Series:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

s
#a    1
#b    2
#c    3
#d    4
#dtype: int64

You can directly loop through it, which yield one value from the series in each iteration:

for i in s:
    print(i)
1
2
3
4

If you want to access the index at the same time, you can use either items or iteritems method, which produces a generator that contains both the index and value:

for i, v in s.items():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

for i, v in s.iteritems():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

Old Answer:

You can call iteritems() method on the Series:

for i, row in df.groupby('a').size().iteritems():
    print(i, row)

# 12 4
# 14 2

According to doc:

Series.iteritems()

Lazily iterate over (index, value) tuples

Note: This is not the same data as in the question, just a demo.

That's the solution I was looking for - thanks. I had tried `.iterrows()` but it did not provide the solution I was looking for. — Reily Bourne, Jul 15 '16 at 04:22
`iterrows()` is the method for data frame and for Series, `iteritems()` works. — Psidom, Jul 15 '16 at 13:23

score 10 · Answer 2 · answered Nov 05 '18 at 14:26

To expand upon the answer of Psidom, there are three useful ways to unpack data from pd.Series. Having the same Series as Psidom:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

A direct loop over s yields the value of each row.
A loop over s.iteritems() or s.items() yields a tuple with the (index,value) pairs of each row.
Using enumerate() on s.iteritems() yields a nested tuple in the form of: (rownum,(index,value)).

The last way is useful in case your index contains other information than the row number itself (e.g. in a case of a timeseries where the index is time).

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

for rownum,(indx,val) in enumerate(s.iteritems()):
    print('row number: ', rownum, 'index: ', indx, 'value: ', val)

will output:

row number:  0 index:  a value:  1
row number:  1 index:  b value:  2
row number:  2 index:  c value:  3
row number:  3 index:  d value:  4

You can read more on unpacking nested tuples here.

How to iterate over Pandas Series generated from groupby().size()

2 Answers2

Linked