I'm trying to grasp the differences between transform
and apply
methods of pandas. This answer was pretty helpful. The first major difference is only applicable to the case when transform
is called from the DataFrameGroupBy object, so it's not affecting the calls from Series in any way. The second major difference is that transform
performs checks to make sure that the output length of the method matches its input length. So I tried to come up with the test case where I call these methods from a Seires, and apply
executes successfully while transform
fails due to length difference. What I ended up with is this:
some_series.apply(lambda x: pd.Series([1,2]))
some_series.transform(lambda x: pd.Series([1,2]))
To my surprise, both of them successfully produced a DataFrame with two columns, in exactly the same way. If transform
uses the len()
function to check the length, that makes sense, since len()
returns the number of rows, and that doesn't change. But then I'm having a hard time coming up with any situation where the number of rows could change, because it seems like the number of function calls that transform/apply
make seems to be predetermined by the number of rows in the original Series, regardless of what exactly the function call returns.
So the question is, is there any situation where behaviors of transform
and apply
methods differ at all when called from a Series object? If not, it seems like only apply
should be used, because it is faster.