3

I had always thought Series.apply was a loop over the rows and we know that DataFrame.apply(axis=1) has some horrible scaling https://stackoverflow.com/a/55557758/4333359.

But when trying to get the midpoint of a Series of pandas._libs.interval.Interval with a dtype of category I noticed that Series.apply seems to not be doing that at all and is as fast as something like Series.map

In the following example I try to get the midpoint of an Interval either by applying a lambda (should be slow?) or doing something I thought should be very fast (mapping the 3 intervals to their midpoints, looping over only the 3 unique intervals).

So what's going on, why is the Series.apply so fast, or perhaps the map is just very slow?

import perfplot
import pandas as pd
import numpy as np

def apply_mid(s):
    return s.apply(lambda x: x.mid)

def map_mid(s):
    d = {x: x.mid for x in s.cat.categories}
    return s.map(d)

perfplot.show(
    setup=lambda n: pd.cut(pd.Series(np.random.randint(1, 100, 3*n)), 3), 
    kernels=[
        lambda s: apply_mid(s),
        lambda s: map_mid(s),
    ],
    labels=['apply', 'map'],
    n_range=[2 ** k for k in range(24)],
    equality_check=np.allclose,  
    xlabel='~len(df)'
)

enter image description here

ALollz
  • 57,915
  • 7
  • 66
  • 89
  • I suspect `map` is slow here since it's mapping an interval. But good question. – Erfan Feb 14 '20 at 21:49
  • 1
    @Erfan yeah, probably. I know that `map` is only optimized for some very specific inputs. But even just comparing `lambda x: x.mid` to a lambda that does something silly like `lambda x: 1` the `x.mid` seems to be doing something to be so fast – ALollz Feb 14 '20 at 21:56
  • Weird, your code doesn't run on my system. Throw: `TypeError: Cannot cast CategoricalIndex to dtype float64` – Quang Hoang Feb 14 '20 at 22:00
  • @QuangHoang hmm weird. It worked on mine with both `0.25.0` and `1.0.0` – ALollz Feb 14 '20 at 22:05
  • 1
    Nice question, looking forward to the answer. Gratz on 30k :) – Celius Stingher Feb 14 '20 at 22:30
  • Have you looked at the source code? (I'm not very familiar with it myself, unfortunately) – AMC Feb 14 '20 at 23:46

0 Answers0