1

I am doing a performance test in python. By indexing large arrays in different ways, I got quite different performance. Here is something from my current project.

import numpy as np
import time

N=100000
i=0
rep=1000

x1=np.random.randn(N,4)
x2=np.random.randn(4,N)
G1=np.random.randn(N,4,3)
G2=np.random.randn(4,N,3)
B=np.random.randn(N,3)


starttime=time.time()
for k in range(rep):
    x1[:,i]=(G1[:,i,:] * B[:,:]).sum(axis=1)
elapsedtime1=time.time()-starttime


starttime=time.time()
for k in range (rep):
    x2[i,:]=(G2[i,:,:] * B[:,:]).sum(axis=1)
elapsedtime2=time.time()-starttime

print ("elaplsedtime1= "+str(elapsedtime1))
print ("elaplsedtime2= "+str(elapsedtime2))
diff=(elapsedtime2-elapsedtime1)/elapsedtime2
print ("diff= "+str(diff))

I got these results:

% python test.py
elaplsedtime1= 2.46446800232
elaplsedtime2= 1.52360200882
diff= -0.617527404173

In other words, the two computations have a 60% performance difference. Is it unexpected?

zell
  • 9,830
  • 10
  • 62
  • 115
  • 2
    Does this answer your question? [Performance of row vs column operations in NumPy](https://stackoverflow.com/questions/17954990/performance-of-row-vs-column-operations-in-numpy) – mkrieger1 Jun 22 '20 at 19:25
  • Thanks. Answers to that link are a bit high-level. How would you map those answers to my concrete problem? – zell Jun 22 '20 at 19:48
  • In the 2nd case the `i` dimension is the first, outermost. So the selection `G2[i,:,:]` is contiguous, more compact than the `G1[:,i,:]`. So some speed advantage for `G2` is reasonable (especially with this large `N`). The use of `strides` reduces `numpy's` sensitivity to axes, it still doesn't eliminate it. – hpaulj Jun 22 '20 at 19:49

1 Answers1

1

As it was already answered in the comments, the difference in performance is probably due to indexing.

However I noticed when working with numpy that you get a bad efficiency when calling sum on axis that have a small dimension (3 in your case).

For your test case, you can still get an additional 100% speed-up by replacing the call to the sum function by a matrix product operation with a vector of ones.

Here I added a dimension variable D (D=3 in your case):

import numpy as np
import time

N=100000
i=0
rep=1000

D = 3

x1=np.random.randn(N,4)
x2=np.random.randn(4,N)
G1=np.random.randn(N,4,D)
G2=np.random.randn(4,N,D)
B=np.random.randn(N,D)

ones=np.ones((D,))

starttime=time.time()
for k in range(rep):
    x1[:,i]=(G1[:,i,:] * B[:,:]).sum(axis=1)
elapsedtime1=time.time()-starttime

starttime=time.time()
for k in range (rep):
    x1[:,i]=(G1[:,i,:] * B[:,:]) @ ones
elapsedtime2=time.time()-starttime

print ("elaplsedtime1= "+str(elapsedtime1))
print ("elaplsedtime2= "+str(elapsedtime2))
diff=(elapsedtime2-elapsedtime1)/elapsedtime2
print ("diff12= "+str(diff))

I got these results:

% python3 script.py
elaplsedtime1= 2.2359278202056885
elaplsedtime2= 1.1143040657043457
diff12= -1.006568843300747

Note that the speed-up remains even if the vector ones is created on the fly. However the speed-up decreases when the dimension D increases.

Hope this helps.

bousof
  • 1,241
  • 11
  • 13