1

I want to create a 1D array that consists of alternating sets of ones and zeros defined by two input arrays. For example:

import numpy as np

In1 = np.array([2, 1, 3])
In2 = np.array([1, 1, 2])

Out1 = np.array([])

for idx in range(In1.size):
    Ones = np.ones(In1[idx])
    Zeros = np.zeros(In2[idx])

    Out1 = np.concatenate((Out1, Ones, Zeros))

print(Out1)
array([1., 1., 0., 1., 0., 1., 1., 1., 0., 0.])

Is there a more efficient way to do this that doesn't use a for loop?

Divakar
  • 218,885
  • 19
  • 262
  • 358
Al-Baraa El-Hag
  • 770
  • 6
  • 15

3 Answers3

3

Using np.repeat:

(np.arange(1,1+In1.size+In2.size)&1).repeat(np.array([In1,In2]).reshape(-1,order="F"))
# array([1, 1, 0, 1, 0, 1, 1, 1, 0, 0])
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • Not pretty but seems good on performance if the island lengths are not huge. Guess we can optimize further by repeating on a boolean array. – Divakar Jul 02 '20 at 13:15
  • @Divakar I'm done with pretty ;-) I think this one is good for small arrays, for larger ones yours seem faster. – Paul Panzer Jul 02 '20 at 13:26
2

Here's a vectorized one using cumsum -

L = In1.sum() + In2.sum()
idar = np.zeros(L, dtype=int)

s = In1+In2
starts = np.r_[0,s[:-1].cumsum()]
stops = In1+starts
idar[starts] = 1
idar[stops] = -1
out = idar.cumsum()

Alternatively, if the slices are large or just to achieve memory efficiency, we might want to use a loop with just slicing to assign 1s -

# Re-using L, starts, stops from earlier approach
out = np.zeros(L, dtype=bool)
for (i,j) in zip(starts,stops):
    out[i:j] = 1
out = out.view('i1')
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • That cumsum approach is amazing in it's elegance. Which approach do you recommend? My assumption is that cumsum is optimized on C so it will be the fastest to run. My bottleneck is cpu speed and not RAM. – Al-Baraa El-Hag Jul 02 '20 at 12:54
  • 1
    @Al-BaraaEl-Hag Yeah for large number of entries in `In1` and `In2`, you would see the vectorized one doing better. – Divakar Jul 02 '20 at 13:14
0

I did this with map. In my opinion the most time consuming part of you code is concatenations so I replaced that with python lists. (based on this)

from itertools import chain
creator = lambda i: In1[i]*[1] + In2[2]*[0]
nested = list(map(creator,range(len(In1))))
flatten = np.array(list(chain(*nested)))
print(flatten)