I had always thought Series.apply
was a loop over the rows and we know that DataFrame.apply(axis=1) has some horrible scaling https://stackoverflow.com/a/55557758/4333359.
But when trying to get the midpoint of a Series of pandas._libs.interval.Interval
with a dtype of category
I noticed that Series.apply
seems to not be doing that at all and is as fast as something like Series.map
In the following example I try to get the midpoint of an Interval either by applying a lambda (should be slow?) or doing something I thought should be very fast (mapping the 3 intervals to their midpoints, looping over only the 3 unique intervals).
So what's going on, why is the Series.apply
so fast, or perhaps the map
is just very slow?
import perfplot
import pandas as pd
import numpy as np
def apply_mid(s):
return s.apply(lambda x: x.mid)
def map_mid(s):
d = {x: x.mid for x in s.cat.categories}
return s.map(d)
perfplot.show(
setup=lambda n: pd.cut(pd.Series(np.random.randint(1, 100, 3*n)), 3),
kernels=[
lambda s: apply_mid(s),
lambda s: map_mid(s),
],
labels=['apply', 'map'],
n_range=[2 ** k for k in range(24)],
equality_check=np.allclose,
xlabel='~len(df)'
)