I was looking for an efficient way to calculate the nth largest value in a numpy array and this answer lead me to np.partition.
By the way, I have noticed that the naive sorting is faster than np.partition approach for array shorter than 100 entries. (For large arrays, instead, the gain is evident)
What is the reason why np.partition run time is almost flat for small arrays?
Code to generate picture:
import pandas as pd
import numpy as np
import timeit
def func_1(inp):
return np.partition(inp, 10)[10]
def func_2(inp):
return np.sort(inp)[10]
a = []
b = []
N_tests = int(1e5)
for wdw in range(20, 1000, 10):
print wdw
res1 = timeit.timeit("func_1(test)",
setup = "import pandas as pd; import numpy as np; wdw_size = %d; test = np.random.randn(wdw_size); from __main__ import func_1"%wdw, number = N_tests)
a.append(res1)
res2 = timeit.timeit("func_2(test)",
setup = "import pandas as pd; import numpy as np; wdw_size = %d; test = np.random.randn(wdw_size); from __main__ import func_2"%wdw, number = N_tests)
b.append(res2)
import matplotlib.pyplot as plt
plt.plot(range(20,1000, 10), a, range(20, 1000, 10), b)
plt.legend(['np.partition', 'np.sort'])
plt.xlabel('Array Size')
plt.ylabel('Time')