2

Is there a way to have a counter variable in the function called through a pandas groupby apply?

def func():
    # Get the counter for how many times this func has been called

df.groupby('A').apply(func)

This is what I am doing right now:

grp = df.groupby('A')
idx = 1
for name, group in grp:
    print name
    func(grp,idx)
    idx += 1
user308827
  • 21,227
  • 87
  • 254
  • 417

1 Answers1

3

Note: this is an implementation detail, the number of times the function in an apply is called may depend on the return type / whether the apply takes the slow or fast path...

I would count the number of times a function is called by updating an attribute on this function:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  3  4

In [13]: def f(a):
             f.i += 1
             return a

In [14]: f.i = 0  # set the number of calls to 0

In [15]: g = df.groupby('A')

In [16]: g.apply(f)
Out[16]:
   A  B
0  1  2
1  3  4

In [17]: f.i
Out[17]: 3

As we see, the number of times f is called is 3 (perhaps surprisingly).

To get the number of groups you can use the ngroups attribute:

In [18]: g.ngroups
Out[18]: 2
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • thanks! this is helpful to know. Don't know if I am alone in wanting a counter..... – user308827 May 01 '15 at 03:59
  • Isn't the first call to apply the initialisation of the groups though, I thought I saw this as an explanation in a previous answer somewhere... – EdChum May 01 '15 at 07:28
  • 1
    @EdChum yeah, it's to get the correct type of the result. It *may* be that this function is called more times that you'd think - IIRC the result of previous discussion was this was an implementation detail... that is, it's not guaranteed to be `g.ngroups+1`. One example where it may be precisely n is if you pass a ufunc. – Andy Hayden May 01 '15 at 07:50
  • 1
    @user308827 As Ed mentions, this has come up before (you are not alone!) – Andy Hayden May 01 '15 at 07:52
  • 1
    [This](https://github.com/pydata/pandas/issues/7739) is by design, also related: http://stackoverflow.com/questions/21390035/python-pandas-groupby-object-apply-method-duplicates-first-group – EdChum May 01 '15 at 07:54
  • @EdChum I would argue it's not so much "by design" rather it's an "implementation detail" :) – Andy Hayden May 01 '15 at 08:04
  • thanks again, @Andy. In your solution, what is f.i? It looks like a class member but haven't seen syntax like that before.... – user308827 May 01 '15 at 13:32
  • 1
    @user308827 it's an attribute to the function f, in python you can "monkey patch" many objects with your own attributes like this. – Andy Hayden May 01 '15 at 17:26