2

I was looking at the code of numpy.argmax function. I am confused which data structure numpy maintains for the argmax function.

https://numpy.org/doc/stable/reference/generated/numpy.argmax.html

Eventually, I want to know what is the theoretical average case running time complexity of numpy argmax function for primitive data types. Is it O(logN) or O(N) in the average case?

This may be a relevant question as well: Faster alternatives to numpy.argmax/argmin which is slow

Thanks in advance.

Mazharul Islam
  • 187
  • 1
  • 1
  • 10

2 Answers2

2

Here is a performance analysis using benchit:

def m(x):
  return np.argmax(x)

in_ = [np.random.rand(n) for n in [10,100,1000,10000]]

As you can see it is O(N) as it should be. You iterate over array once to find the maximum.

enter image description here

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • Are you sure you are timing only the argmax function? Even from a couple years ago it is unheard of for an argmax of 1000 values to take ~3s. And the results should be very different depending on where you are at versus your different memory structures. For example, my results show essentially exponential growth until about 10,000 before becoming linear with factor 1:1 (5e-10s per value in the array). Your results show the scale is less than 1:1 though, for 10x increase in the array it only adds 3x time. – alwaysmvp45 Apr 05 '23 at 14:14
  • @alwaysmvp45 Looks like the xaxis scale is log scale which is inline with both your statements in the comment. As for the absolute time of 3s for array of 1000 elements, I am not sure how benchit calculates it. I think the idea is to see the argmax is O(N). the absolute timing as you said, would depend on hardware architecture. – Ehsan Apr 06 '23 at 17:05
2

Going to add a response here as the numbers will naturally depend on your hardware and this should provide some insight for others. Here is a benchit result showing how memory structure also impacts the results, but the linear portion kicks in for vectors around 10,000 in size, with 1:1 scaling and a good approximation for vectors >10,000 samples is 5e-10 per value in the vector.

enter image description here

alwaysmvp45
  • 437
  • 4
  • 8