Benchmarking for the vectorized solutions
We are looking to benchmark vectorized solutions in this post. Now, the vectorization tries to avoid the looping that we would loop through each row and do the shuffling. So, the setup for the input array involves a greater number of rows.
Approaches -
def app1(a): # @Daniel F's soln
i, j = np.nonzero(a.astype(bool))
k = np.argsort(i + np.random.rand(i.size))
a[i,j] = a[i,j[k]]
return a
def app2(x): # @kazemakase's soln
r, c = np.where(x != 0)
n = c.size
perm = np.random.permutation(n)
i = np.argsort(perm + r * n)
x[r, c] = x[r, c[i]]
return x
def app3(a): # @Divakar's soln
m,n = a.shape
rand_idx = np.random.rand(m,n).argsort(axis=1)
b = a[np.arange(m)[:,None], rand_idx]
a[a!=0] = b[b!=0]
return a
from scipy.ndimage.measurements import labeled_comprehension
def app4(a): # @FabienP's soln
def func(array, idx):
r[idx] = np.random.permutation(array)
return True
labels, idx = nz = a.nonzero()
r = a[nz]
labeled_comprehension(a[nz], labels + 1, np.unique(labels + 1),\
func, int, 0, pass_positions=True)
a[nz] = r
return a
Benchmarking procedure #1
For a fair benchmarking, it seemed reasonable to use OP's sample and simply stack those as more rows to get a bigger dataset. Thus, with that setup we could create two cases with 2 million and 20 million rows datasets.
Case #1 : Large dataset with 2*1000,000
rows
In [174]: a = np.array([[2,3,1,0],[0,0,2,1]])
In [175]: a = np.vstack([a]*1000000)
In [176]: %timeit app1(a)
...: %timeit app2(a)
...: %timeit app3(a)
...: %timeit app4(a)
...:
1 loop, best of 3: 264 ms per loop
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 254 ms per loop
1 loop, best of 3: 14.3 s per loop
Case #2 : Larger dataset with 2*10,000,000
rows
In [177]: a = np.array([[2,3,1,0],[0,0,2,1]])
In [178]: a = np.vstack([a]*10000000)
# app4 skipped here as it was slower on the previous smaller dataset
In [179]: %timeit app1(a)
...: %timeit app2(a)
...: %timeit app3(a)
...:
1 loop, best of 3: 2.86 s per loop
1 loop, best of 3: 4.62 s per loop
1 loop, best of 3: 2.55 s per loop
Benchmarking procedure #2 : Extensive one
To cover all cases of varying percentage of non-zeros in the input array, we are covering some extensive benchmarking scenarios. Also, since app4
seemed much slower than others, for a closer inspection we are skipping that one in this section.
Helper function to setup input array :
def in_data(n_col, nnz_ratio):
# max no. of elems that my system can handle, i.e. stretching it to limits.
# The idea is to use this to decide the number of rows and always use
# max. possible dataset size
num_elem = 10000000
n_row = num_elem//n_col
a = np.zeros((n_row, n_col),dtype=int)
L = int(round(a.size*nnz_ratio))
a.ravel()[np.random.choice(a.size, L, replace=0)] = np.random.randint(1,6,L)
return a
Main timing and plotting script (Uses IPython magic functions. So, needs to be run opon copying and pasting onto IPython console) -
import matplotlib.pyplot as plt
# Setup input params
nnz_ratios = np.array([0.2, 0.4, 0.6, 0.8])
n_cols = np.array([4, 5, 8, 10, 15, 20, 25, 50])
init_arr1 = np.zeros((len(nnz_ratios), len(n_cols) ))
init_arr2 = np.zeros((len(nnz_ratios), len(n_cols) ))
init_arr3 = np.zeros((len(nnz_ratios), len(n_cols) ))
timings = {app1:init_arr1, app2:init_arr2, app3:init_arr3}
for i,nnz_ratio in enumerate(nnz_ratios):
for j,n_col in enumerate(n_cols):
a = in_data(n_col, nnz_ratio=nnz_ratio)
for func in timings:
res = %timeit -oq func(a)
timings[func][i,j] = res.best
print func.__name__, i, j, res.best
fig = plt.figure(1)
colors = ['b','k','r']
for i in range(len(nnz_ratios)):
ax = plt.subplot(2,2,i+1)
for f,func in enumerate(timings):
ax.plot(n_cols,
[time for time in timings[func][i]],
label=str(func.__name__), color=colors[f])
ax.set_xlabel('No. of cols')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
plt.title('Percentage non-zeros : '+str(int(100*nnz_ratios[i])) + '%')
plt.subplots_adjust(wspace=0.2, hspace=0.2)
Timings output -

Observations :
Approaches #1, #2 does argsort
on the non-zero elements across the entire input array. As such, it performs better with lesser percentage of non-zeros.
Approach #3 creates random numbers of the same shape as the input array and then gets argsort
indices per row. Thus, for a given number of non-zeros in the input, the timings for it are more steep-ish than first two approaches.
Conclusion :
Approach #1 seems to be doing pretty well until 60% non-zero mark. For more non-zeros and if the row-lengths are small, approach #3 seems to be performing decently.