2

I'm new to Pandas and am trying to do some basic data transformation exercise. One method I tried to use is groupby, but I fail to understand the output I am seeing.

df = pd.DataFrame({'row': range(10), 'time': range(10), 'machine': ['M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1'], 'value1': range(10), 'value2': range(10)})

def func(g):
    print '----', type(g)
    return 42

print df.groupby('machine', axis=0).apply(func)

Why is this printing the print statement in the function 4 times? The way I would have thought it works is to group df into 3 dataframes (for each machine) and apply func on each of those grouped dataframes. But this is not what I observe...

The complete output:

---- <class 'pandas.core.frame.DataFrame'>
---- <class 'pandas.core.frame.DataFrame'>
---- <class 'pandas.core.frame.DataFrame'>
---- <class 'pandas.core.frame.DataFrame'>
machine
M1         42
M2         42
M3         42
dtype: int64

Update

I just found this duplicate.

Community
  • 1
  • 1
orange
  • 7,755
  • 14
  • 75
  • 139
  • IIRC, `groupby.apply` does some inference on the what the output of `fund` will be by calling it. In this case it sees that the output is an aggregation (since calling `func` returns a single number). Notice that the actual output is just a `Series` with 3 items. – TomAugspurger Mar 26 '14 at 00:12
  • So in other words, the first call is for the inference and the other 3 are the actual application of the function. I'm not saying there's anything wrong with it, but I was just curious as I didn't expect this behaviour and thought I might have done something wrong... – orange Mar 26 '14 at 00:30
  • See also at the end of this section: http://pandas-docs.github.io/pandas-docs-travis/groupby.html#flexible-apply A clarification on this that was just added to the documentation in development. – joris Apr 02 '14 at 08:27

0 Answers0