2

I've been using pandas apply method for both series and dataframe, but I am obviously still missing something, because I'm stumped on a simple function i'm trying to execute.

This is what I was doing:

def minmax(row):
    return (row - row.min())/(row.max() - row.min())

row.apply(minmax)

but, this returns an all zero Series. For example, if

row = pd.Series([0, 1, 2])

then

minmax(row)

returns [0.0, 0.5, 1.0], as desired. But, row.apply(minmax) returns [0,0,0].

I believe this is because the series is of ints and the integer division returns 0. However, I don't understand,

  • why it works with minmax(row) (shouldn't it act the same?), and
  • how to cast it correctly in the apply function to return appropriate float values (i've tried to cast it using .astype and this gives me all NaNs... which I also don't understand)
  • if apply this to a dataframe, as df.apply(minmax) it also works as desired. (edit added)

i suspect i'm missing something fundamental in how the apply works... or being dense. either way, thanks in advance.

Renée
  • 455
  • 2
  • 7
  • 15

1 Answers1

3

When you call row.apply(minmax) on a Series only the values are passed to the function. This is called element-wise.

Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

When you call row.apply(minmax) on a DataFrame either rows (default) or columns are passed to the function (according to the value of axis).

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty. This is called row-wise or column-wise.

This is why your example works as expected on the DataFrame and not on the Series. Check this answer for information on mapping functions to Series.

Community
  • 1
  • 1
Romain
  • 19,910
  • 6
  • 56
  • 65
  • Ah! I was ignoring that the only functions that can be applied to the entire series are numpy functions. So, in this particular case, apply is working the same as map would, i believe. Can you explain why python doesn't get upset about applying .min() and .max() to the values? In this context, it doesn't make sense and i would have expected to error. I did this, row.apply(lambda x: x - x.min()), to test out what you were saying, and it also returns [0,0,0] so i gather x.min() == x but i would think it would error. thanks! – Renée Jun 09 '16 at 21:29
  • This is weird, in my test the call of the function on the `Series` produces an error `AttributeError: 'int' object has no attribute 'min'`. This seems to be the expected behavior. – Romain Jun 09 '16 at 21:40
  • Hmm. what pandas are you using? I've recreated it w/o error both in Juypter (where i was working) and as a script in PyCharm. Raising an error would have saved me a lot of time. :) I'll update to 0.18.1 and see. thanks again. – Renée Jun 09 '16 at 21:58