How is sorting faster than linear (low constant)?

Question

I've compared the performances of some built-in sort functions in python 3 versus some algorithms that I know their performances, and I saw some unexpected results. I would like to clarify those issues.

I used the "perfplot" library to compare the complexity of the algorithms visualized.

import numpy as np
import perfplot

# algorithms:


def sort(x):
    np.sort(x)
    return 0


def sort_c(x):
    np.sort_complex(x)
    np.sort_complex(x)
    return 0


def builtin(x):
    sorted(x)
    return 0


def c_linear_inplace(x):
    for i in range(len(x) - 1):
        if x[i] > x[i + 1]:
            x[i] = x[i] + x[i + 1]
            x[i + 1] = x[i] - x[i + 1]
            x[i] = x[i] - x[i + 1]
    return 0


def c_linear_outplace(x):
    a = x.copy()
    for i in range(len(x) - 1):
        if x[i] > x[i + 1]:
            a[i] = x[i + 1]
            a[i + 1] = x[i]
    x = a.copy()
    return 0

def c_nlogn(x):
    logn = int(np.ceil(np.log2(len(x))))
    for i in range(len(x)-1):
        for j in range(logn):
            x[i] = 0
    return 0

#comprasion:
perfplot.show(
    setup=np.random.rand,  # function to generate input for kernel by n
    kernels=[
        sort,
        sort_c,
        builtin,
        c_linear_inplace,
        c_linear_outplace,
        c_nlogn,
    ],
    labels=["sort", "sort_c", "builtin", "check: linear inplace", "check: linear not in place", "check: nlogn"],
    n_range=[2 ** k for k in range(15)],  # list of sizes of inputs, i"setup" function will be called with those values
    xlabel="len(a)",
)

I expect all sort functions to be close to the nlogn() function or at least less efficient than the linear ones, and I expect "c_linear_inplace" would be faster than "c_linear_outplace". but all built-in sort functions were much faster than the linear functions and the "c_linear_outplace" function was slower than the "c_linear_inplace" function.

EDIT: as I see it, those are the complexities of the functions with constants:
sort, sort_c, builtin : cnlog₂(n) for c>=1
check: linear inplace : 6n
check: linear not in place : 7n
check: nlogn : 2n + 3nlog₂n

I calculated any for loop as 2*(number of iterations) for the check and increment. and any "if" as O(1) and any assignment (=) as O(1) and it seems odd that "check: linear not in place" that even takes more memory has much better performances than "check: linear inplace" and still worse than any of numpy's sorts

I am not surprised the built-in functions to be faster than the others functions you wrote, because probably they are using C under the hood — Nikaido, Sep 09 '19 at 17:06
Incidentally, if you want a language that's comparably high-level, but where code is compiled down to machine instructions with near-C performance, you might find [Julia](https://julialang.org/benchmarks/) interesting. As it is, you've got a bunch of constant-factor interpreter overhead. — Charles Duffy, Sep 09 '19 at 17:11
Having a lower asymptotic complexity doesn't mean it's always faster. It just means that there exists a crossover point after which one will always win. In your case, the crossover point is probably just higher than 15k. — that other guy, Sep 09 '19 at 17:16
Not related to the complexity, but note that your strategy for swapping two floating-point values in `c_linear_inplace` can actually end up changing those values, as a result of floating-point rounding. Much better (and likely also significantly faster) to do `x[i], x[i+1] = x[i+1], x[i]` instead. — Mark Dickinson, Sep 09 '19 at 17:31

score 2 · Answer 1 · answered Sep 09 '19 at 17:23

You want to display your results on a "log-log" plot. Asymptotic/Big-O notation approximates runtime by ignoring lower order factors, which can have a significant effect on actual runtime, hence the differences you're seeing.

If you did a log-log plot, you should get approximately straight lines appearing that are offset in height by these lower order constants. Also note that starting from a single element will tend to highlight the overhead of calling the function, rather than any asymptotic performance.

for example, this is what I get when running with:

    n_range=[2 ** k for k in range(10, 17)],
    xlabel="len(a)", logx=True, logy=True,

it's now more obvious that the the height differences are just the differences in performance

and still, all linear functions are slower than all numpy sorts. — Yuval Zilber, Sep 10 '19 at 15:49
`n*log(n)` < `n*100` for arrays of length `n`, you just have a much larger constant in your "linear" functions — Sam Mason, Sep 11 '19 at 10:13

that other guy · Answer 2 · 2019-09-10T17:57:40.050

1

Being asymptotically faster doesn't mean it'll always run faster. It just means that after a certain value of n, it'll start to win. This n may even be outside your practical problem domain, so that you'll never see a problem where the asymptotically faster function is chronologically faster. This was the case in your plot.

If you zoom out from 0-16,000 (range(15)) to 0-67,000,000 (range(27)), the story is different. In particular, builtin is faster than check: linear inplace up until 40M, but then starts to lose:

It's not unreasonable for there to be e.g. a 50x overhead of doing Python->C FFI to access an array value compared to C accessing an array directly. In that case, n log2 n would not be faster than 50*n until n > 2^50, so chances are you'll never see sort being slower than linear inplace on today's hardware.

edited Sep 10 '19 at 17:57

answered Sep 09 '19 at 18:56

that other guy

116,971
11
170
194

1) what does it mean "Python->C FFI"? 2) can u explain why "check: linear not in place" is faster than "check: linear inplace"? 3) what do u mean "you'll never see sort faster than linear inplace"? the 2 numpy's sort functions are faster than "linear inplace" – Yuval Zilber Sep 10 '19 at 15:45
1

1. Foreign Function Interfaces is how you call functions or access data between two programming languages in a single process. It's how Python code accesses data in C NumPy arrays. 2. I'm not sure but I'm guessing data dependencies. Every line in the in-place one depends on the result of the previous line, while the out-of-place is very CPU pipeline friendly. It would make a good question on its own. 3. Sorry, I flipped the two. It should be the other way around (fixed). – that other guy Sep 10 '19 at 18:04
1

2. I was wrong, it's just down to the algorithm. The out-of-place one will do 2 assignments on average 50% of the time. The in-place version will do 3 assignments roughly 100% of the time (since it always compares against the largest item it's seen). That's a 3x constant factor which is roughly what you're seeing – that other guy Sep 10 '19 at 19:54

How is sorting faster than linear (low constant)?

2 Answers2