Why is Series.apply so fast on a column with Intervals?

Question

I had always thought Series.apply was a loop over the rows and we know that DataFrame.apply(axis=1) has some horrible scaling https://stackoverflow.com/a/55557758/4333359.

But when trying to get the midpoint of a Series of pandas._libs.interval.Interval with a dtype of category I noticed that Series.apply seems to not be doing that at all and is as fast as something like Series.map

In the following example I try to get the midpoint of an Interval either by applying a lambda (should be slow?) or doing something I thought should be very fast (mapping the 3 intervals to their midpoints, looping over only the 3 unique intervals).

So what's going on, why is the Series.apply so fast, or perhaps the map is just very slow?

import perfplot
import pandas as pd
import numpy as np

def apply_mid(s):
    return s.apply(lambda x: x.mid)

def map_mid(s):
    d = {x: x.mid for x in s.cat.categories}
    return s.map(d)

perfplot.show(
    setup=lambda n: pd.cut(pd.Series(np.random.randint(1, 100, 3*n)), 3), 
    kernels=[
        lambda s: apply_mid(s),
        lambda s: map_mid(s),
    ],
    labels=['apply', 'map'],
    n_range=[2 ** k for k in range(24)],
    equality_check=np.allclose,  
    xlabel='~len(df)'
)

I suspect `map` is slow here since it's mapping an interval. But good question. — Erfan, Feb 14 '20 at 21:49
@Erfan yeah, probably. I know that `map` is only optimized for some very specific inputs. But even just comparing `lambda x: x.mid` to a lambda that does something silly like `lambda x: 1` the `x.mid` seems to be doing something to be so fast — ALollz, Feb 14 '20 at 21:56
Weird, your code doesn't run on my system. Throw: `TypeError: Cannot cast CategoricalIndex to dtype float64` — Quang Hoang, Feb 14 '20 at 22:00
@QuangHoang hmm weird. It worked on mine with both `0.25.0` and `1.0.0` — ALollz, Feb 14 '20 at 22:05
Nice question, looking forward to the answer. Gratz on 30k :) — Celius Stingher, Feb 14 '20 at 22:30
Have you looked at the source code? (I'm not very familiar with it myself, unfortunately) — AMC, Feb 14 '20 at 23:46

Why is Series.apply so fast on a column with Intervals?

0 Answers0