83

How do you iterate over a Pandas Series generated from a .groupby('...').size() command and get both the group name and count.

As an example if I have:

foo
-1     7
 0    85
 1    14
 2     5

how can I loop over them so that in each iteration I would have -1 & 7, 0 & 85, 1 & 14 and 2 & 5 in variables?

I tried the enumerate option but it doesn't quite work. Example:

for i, row in enumerate(df.groupby(['foo']).size()):
    print(i, row)

it doesn't return -1, 0, 1, and 2 for i but rather 0, 1, 2, 3.

Reily Bourne
  • 5,117
  • 9
  • 30
  • 41
  • enumerate just calculates the number of item in any sequence, it knows nothing about internal index of Series, that is why it is just 0, 1, 2, 3 and will be the same for any iterable – Leonid Mednikov Aug 09 '17 at 13:41
  • Most numeric operations with pandas can be vectorized - this means they are much faster than conventional iteration. OTOH, some operations (such as string and regex) are inherently hard to vectorize. This this case, it is important to understand _how_ to loop over your data. More more information on when and how looping over your data is to be done, please read [For loops with Pandas - When should I care?](https://stackoverflow.com/questions/54028199/for-loops-with-pandas-when-should-i-care/54028200#54028200). – cs95 Jan 04 '19 at 10:17

2 Answers2

127

Update:

Given a pandas Series:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

s
#a    1
#b    2
#c    3
#d    4
#dtype: int64

You can directly loop through it, which yield one value from the series in each iteration:

for i in s:
    print(i)
1
2
3
4

If you want to access the index at the same time, you can use either items or iteritems method, which produces a generator that contains both the index and value:

for i, v in s.items():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

for i, v in s.iteritems():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

Old Answer:

You can call iteritems() method on the Series:

for i, row in df.groupby('a').size().iteritems():
    print(i, row)

# 12 4
# 14 2

According to doc:

Series.iteritems()

Lazily iterate over (index, value) tuples

Note: This is not the same data as in the question, just a demo.

Community
  • 1
  • 1
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 2
    That's the solution I was looking for - thanks. I had tried `.iterrows()` but it did not provide the solution I was looking for. – Reily Bourne Jul 15 '16 at 04:22
  • 1
    `iterrows()` is the method for data frame and for Series, `iteritems()` works. – Psidom Jul 15 '16 at 13:23
10

To expand upon the answer of Psidom, there are three useful ways to unpack data from pd.Series. Having the same Series as Psidom:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

  • A direct loop over s yields the value of each row.
  • A loop over s.iteritems() or s.items() yields a tuple with the (index,value) pairs of each row.
  • Using enumerate() on s.iteritems() yields a nested tuple in the form of: (rownum,(index,value)).

The last way is useful in case your index contains other information than the row number itself (e.g. in a case of a timeseries where the index is time).

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

for rownum,(indx,val) in enumerate(s.iteritems()):
    print('row number: ', rownum, 'index: ', indx, 'value: ', val)

will output:

row number:  0 index:  a value:  1
row number:  1 index:  b value:  2
row number:  2 index:  c value:  3
row number:  3 index:  d value:  4

You can read more on unpacking nested tuples here.

dbouz
  • 779
  • 9
  • 14