0

How do, sum, 'sum' and np.sum differ, under the bonnet, here:

df.agg(x=('A', sum), y=('B', 'sum'), z=('C', np.sum))

as the output would, arguably, be identical,

adapted from here:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.aggregate.html

df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))

     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

My guess is that the latter of the three is linked to Numpy and the first two may be linked to Python (and/or Pandas), but that's just a rough, un-educated first guess... it would be interesting to know what the single apostrophe signifies here in this context.

nutty about natty
  • 1,267
  • 2
  • 10
  • 17
  • 1
    unfortunately there is no `mean` function in python [builtins](https://docs.python.org/3/library/functions.html), so your code will result in a runtime error, `x=('A',mean)` is actually invalid. – Ahmed AEK Jul 19 '22 at 15:24
  • No, the code actually works... EDIT: it works for `sum`... so will update the question... – nutty about natty Jul 19 '22 at 15:43
  • What's the question? What you posted shows the results of max,min, and mean, not different `sum` calls. The only difference is speed anyway, as Pandas is built on top of Numpy which in turn uses vectorization to speed up operations like `sum` – Panagiotis Kanavos Jul 19 '22 at 15:57
  • `single apostrophe` only creates string and maybe later it convert it back to functions name using dictionary like `{"sum": numpy.sum, ...}` or `{"sum": pandas.sum, ...}` – furas Jul 19 '22 at 15:57

1 Answers1

1

When you call df.agg('sum') it invokes df.sum() (see this answer for an explanation).

df.sum and np.sum(df) will have very similar performance, as pandas Series objects implement numpy's array protocols and calls to np.sum(df) will actually invoke something similar to df.apply(pd.Series.sum) under the hood. Both of these will be faster than the builtin sum for any meaningfully sized DataFrame, as the data is already stored as an array.

See the pandas guide to enhancing performance for more tips.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54