0

Suppose I have the below dataframe:

>>>from itertools import groupby
>>>import pandas as pd

>>>idx1 = pd.date_range('2019-01-01',periods=5)
>>>idx2 = pd.date_range('2020-06-01',periods=5)
>>>idx3 = pd.date_range('2021-08-15',periods=5)
>>>idx4 = pd.date_range('2022-03-20',periods=5)
>>>idx = idx1.union(idx2).union(idx3).union(idx4)

>>>l = [1,-1,-4,2,-3,4,5,1,-3,-4,-5,-3,-4,2,3,-1,-2,3,2,3]

>>>df = pd.DataFrame(l, index=idx, columns=['a'])
>>>df
            a
2019-01-01  1
2019-01-02 -1
2019-01-03 -4
2019-01-04  2
2019-01-05 -3
2020-06-01  4
2020-06-02  5
2020-06-03  1
2020-06-04 -3
2020-06-05 -4
2021-08-15 -5
2021-08-16 -3
2021-08-17 -4
2021-08-18  2
2021-08-19  3
2022-03-20 -1
2022-03-21 -2
2022-03-22  3
2022-03-23  2
2022-03-24  3

>>>for k,g in groupby(df['a'], lambda x: x<0):
       print(k, sum(g))

False 1
True -5
False 2
True -3
False 10
True -19
False 5
True -3
False 8

How can I get the count of the number of instances in each group? I tried to apply the len() built-in but got the below error:

>>>for k,g in groupby(df['a'], lambda x: x<0):
       print(k,len(g))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [241], in <cell line: 1>()
      1 for k,g in groupby(df['a'], lambda x: x<0):
----> 2     print(k,len(g))

TypeError: object of type 'itertools._grouper' has no len()
jgg
  • 791
  • 4
  • 17
  • @sammywemmy it's object of type `itertools._grouper` – jgg Oct 05 '22 at 20:40
  • My bad. Missed that. Run `dir` on the grouper to see its attributes – sammywemmy Oct 05 '22 at 20:41
  • `dir(groupby(df['a'], lambda x: x<0))` produces the following: ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__'] – jgg Oct 05 '22 at 20:48
  • Sorry, run dir on `g` – sammywemmy Oct 05 '22 at 20:49
  • @sammywemmy Same result ....`for k,g in groupby(df['a'], lambda x: x<0): print(k,dir(g)) break ` False ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] – jgg Oct 05 '22 at 20:51
  • Hmmmm.. kk I'll get on my PC and have a look – sammywemmy Oct 05 '22 at 21:14
  • convert to a list/tuple, and then get the length : ``for k, g in groupie(df['a'], lambda x: x<0): print(k, len(list(g))`` – sammywemmy Oct 05 '22 at 21:54

1 Answers1

1

Still unsure why len doesn't work but sum does. Nonetheless, the below produces the desired result.

>>>[len(list(g)) for k,g in groupby(df['a'], lambda x: x<0) if k]
[2,1,5,2]
jgg
  • 791
  • 4
  • 17
  • `sum` can be applied to an iterable; `len` can't. the grouper is an iterable – sammywemmy Oct 05 '22 at 21:55
  • as an aside, why are using itertools' groupby instead of pandas' groupby function? – sammywemmy Oct 05 '22 at 21:55
  • @sammywemmy great question. With pandas `groupby`, one has to chain the `apply` method after `groupby` and also define a function to be used as an argument to `apply`. I recently posted a question about this and am working on solving. – jgg Oct 05 '22 at 21:59
  • 1
    @sammywemmy but a list is an iterable and I can apply `len` to a list? Are there differences between the types of iterators? – jgg Oct 05 '22 at 22:02
  • 1
    my bad. ignore my horrible definitions. the grouper has `__iter__`, so it can be iterated through, which makes it easy for sum to work; however, it does not have a `__len__` implementation, as such ,you can't get the length, until you convert to a sequence, which has a `__len__` implementation. – sammywemmy Oct 05 '22 at 22:08
  • you can run a for loop in pandas groupby, if you don't want to use apply. – sammywemmy Oct 05 '22 at 22:08