Formatting the arrangement of two arrays in order of one in Numpy Python

Question

I am trying to order the zeroes and ones in arrangement of the order. The expected output is what I am trying to get to. Without using a list comprehension preferably.

import numpy as np

order = np.array([0,1,0,1,0])
zeroes= np.array([10,55, 30])
ones = np.array([3,8])

Expected Output

[10, 3, 55, 8, 30]

Pierre D · Accepted Answer · 2021-02-14T05:02:00.270

How about this (no Python loops: 750x faster than a list comprehension, when tested on 200k elements):

# note: updated version: faster and more robust to faulty input
def altcat(zeroes, ones, order):
    i0 = np.nonzero(order == 0)[0][:len(zeroes)]
    i1 = np.nonzero(order == 1)[0][:len(ones)]
    
    z = np.zeros_like(order, dtype=zeroes.dtype)
    z[i0] = zeroes[:len(i0)]
    z[i1] = ones[:len(i1)]
    return z

On your example:

>>> altcat(zeroes=np.array([10,55, 30]), ones=np.array([3,8]),
...        order=np.array([0,1,0,1,0]))
array([10,  3, 55,  8, 30])

Speed

# set up
n = 200_000
np.random.seed(0)
order = np.random.randint(0, 2, size=n)
n1 = order.sum()
n0 = n - n1
ones = np.random.randint(100, size=n1)
zeroes = np.random.randint(100, size=n0)

# for comparison, a method proposed elsewhere, based on lists
def altcat_list(zeroes, ones, order):
    zeroes = list(zeroes)
    ones = list(ones)
    return [zeroes.pop(0) if i == 0 else ones.pop(0) for i in order]

Test:

a = %timeit -o altcat(zeroes, ones, order)
# 2.38 ms ± 573 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

b = %timeit -o altcat_list(zeroes, ones, order)
# 1.84 s ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

b.average / a.average
# 773.59

Note: I initially tried with n = 1_000_000, but while altcat does that in 12.4ms, the list-based version would take forever and I had to stop it.

It seems that the list-based method is worse than O(n) (100K: 0.4s; 200K: 1.84s; 400K: 10.4s).

Addendum

If you really want to do it with a list comprehension and not in pure numpy, then at least consider this:

def altcat_list_mod(zeroes, ones, order):
    it = [iter(zeroes), iter(ones)]
    return [next(it[i]) for i in order]

That's faster than altcat_list(), but still almost 25x slower than altcat():

# on 200k elements

c = %timeit -o altcat_list_mod(zeroes, ones, order)
# 60 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

c.average / a.average
# 24.93

I was just going through the same process, and interestingly I get *exactly* the same results as you. We must have very similar computers. This is definitely the preferred solution. — Nick, Feb 14 '21 at 04:15
check out the modified version: even faster, and more robust to faulty input, e.g.: `altcat(zeroes=np.array([10,55, 30]), ones=np.array([3,8]), order=np.array([0,1,1,1,2])` — Pierre D, Feb 14 '21 at 04:57

Formatting the arrangement of two arrays in order of one in Numpy Python

1 Answers1