1

I am creating a series of calculations in my dataframe and have been using apply successfully, until the one example below. Can anyone explain why "transform" works in this instance but "apply" does not? Ive been doing operations of addition and subtraction with apply successfully, so the new aspect is the np.where.

It doesn't throw an error, it just returns NaNs for the columns.

None of the articles i can find address that apply should have this type of limitation. There is lots of information suggesting transform should be more limiting, namely by only being only to process a single column at a time, and being forced to return a quantity of values equal to the series length.

df['val'] = compiled.groupby(['category']).B.apply(lambda x : np.where(x > 0, x, 0))

df['val'] = compiled.groupby(['category']).B.transform(lambda x : np.where(x > 0, x, 0))
Fred Smith
  • 23
  • 5

1 Answers1

2

df.groubby('cagegory').V.apply(f), when f returns a numpy array, will return a dataframe with one item per category:

import numpy as np
import pandas as pd
np.random.seed(1701)
df = pd.DataFrame({
    'category': ['A', 'A', 'A', 'B', 'B', 'B'],
    'B': np.random.randn(6)
})
df.groupby('category').B.apply(lambda x : np.where(x > 0, x, 0))
# category
# A    [0.0, 2.3759516516254156, 0.0]
# B                   [0.0, 0.0, 0.0]
# Name: B, dtype: object

df.groubby('cagegory').V.transform(f), when f returns a numpy array, will return a dataframe with one item per row in the original dataframe:

df.groupby('category').B.transform(lambda x : np.where(x > 0, x, 0))
# 0    0.000000
# 1    2.375952
# 2    0.000000
# 3    0.000000
# 4    0.000000
# 5    0.000000
# Name: B, dtype: float64

Since you are assigning the result to a column in the original dataframe, transform is the appropriate method to use.

Note that the behavior of apply is similar to that of transform if f returns a pandas Series, which may be why apply worked for you in the past.

See this answer for a more in-depth discussion of the differences between apply and transform.

jakevdp
  • 77,104
  • 11
  • 125
  • 160