Why pandas apply much slower than dataframe merge

Question

From my previous question, I know apply is much slower than dataframe merge directly.

But I am still confused about why that much slower, as in my understanding, if there are N rows in dataframe, apply function should work as O(N)...

Could anyone explain the theory behind apply and dataframe merge to me? Or is there any resources to study that?

Thanks in advance :)

AFAIK `apply` uses a Python interpreted function, while built-in operations are C compiled functions. — Mephy, Aug 02 '16 at 03:08
Hi Mephy, could you share any link about that? I think even so python is slower than c, but I saw apply is more than 100 times slow compared with dataframe merge, so I guess it should not be only related with languages :) — linpingta, Aug 02 '16 at 03:10

score 1 · Accepted Answer · answered Aug 02 '16 at 03:28

The answer is yes. Python can be hundreds of times slower than C, just because it's Python, with equivalent asymptotics. As an applied mathematician with lots of number crunching experience, I can testify that C can be tens to hundreds of times faster than Python. See these benchmarks for an official source.

Remember that asymptotic complexity is about scaling only. Two algorithms can easily have the same complexity and yet differ in runtime by orders of magnitude. Now, if you find that Python is slowing down by a greater factor than C is, (that is doubling the input more than doubles the runtime when it's supposed to be linear), you could be dealing with an asymptotically significant algorithmic difference.

thanks @bpachev ~ I still have two confusions about it: 1. any document with detailed about apply function? 2. any way to avoid that if I want to do data transform :) — linpingta, Aug 02 '16 at 04:14

Why pandas apply much slower than dataframe merge

1 Answers1

Linked