3

I have an array a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9]) that continues like that. I would like to shift this array but I'm not sure if I can use np.roll() here.

The array I would like to produce is [0, 0, 0, 2, 2, 3, 15, 15, 7].

As you can see, the first like numbers which are in array a (in this case the three '2's) should be replaced with '0's. Everything should then be shifted such that the '3's are replaced with '2's, the '15' is replaced with the '3' etc. Ideally I would like to do this operation without any for loop as I need it to run quickly.

I realise this operation may be a bit confusing so please ask questions.

Matt Hall
  • 7,614
  • 1
  • 23
  • 36
Alex Pharaon
  • 121
  • 1
  • 7

4 Answers4

2

This is not using numpy, but one approach that comes to mind is to itertools.groupby to collect contiguous runs of the same elements. Then shift all the elements (by prepending a 0) and use the counts to repeat them.

from itertools import chain, groupby

def shift(data):
    values = [(k, len(list(g))) for k,g in groupby(data)]
    keys = [0] + [i[0] for i in values]
    reps = [i[1] for i in values]
    return list(chain.from_iterable([[k]*rep for k, rep in zip(keys, reps)]))

For example

>>> a = np.array([2,2,2,3,3,15,7,7,9])
>>> shift(a)
[0, 0, 0, 2, 2, 3, 15, 15, 7]
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
2

If you want to stick with NumPy, you can achieve this using np.unique by returning the counts per unique elements with the return_counts option.

Then, simply roll the values and construct a new array with np.repeat:

>>> s, i, c = np.unique(a, return_index=True, return_counts=True)
(array([ 2,  3,  7,  9, 15]), array([0, 3, 6, 8, 5]), array([3, 2, 2, 1, 1]))

The three outputs are respectively: unique sorted elements, indices of first encounter unique element, and the count per unique element.

np.unique sorts the value, so we need to unsort the values as well as the counts first. We can then shift the values with np.roll:

>>> idx = np.argsort(i)
>>> v = np.roll(s[idx], 1)
>>> v[0] = 0
array([ 0,  2,  3, 15,  7])

Alternatively with np.append, this requires a whole copy though:

>>> v = np.append([0], s[idx][:-1])
array([ 0,  2,  3, 15,  7])

Finally reassemble:

>>> np.repeat(v, c[idx])
array([ 0,  0,  0,  2,  2,  3, 15, 15,  7])

Another - more general - solution that will work when there are recurring values in a. This requires the use of np.diff.

You can get the indices of the elements with:

>>> i = np.diff(np.append(a, [0])).nonzero()[0] + 1
array([3, 5, 6, 8, 9])

>>> idx = np.append([0], i)
array([0, 3, 5, 6, 8, 9])

The values are then given using a[idx]:

>>> v = np.append([0], a)[idx]
array([ 0,  2,  3, 15,  7,  9])

And the counts per element with:

>>> c = np.append(np.diff(i, prepend=0), [0])
array([3, 2, 1, 2, 1, 0])

Finally, reassemble:

>>> np.repeat(v, c)
array([ 0,  0,  0,  2,  2,  3, 15, 15,  7])
Ivan
  • 34,531
  • 8
  • 55
  • 100
  • 2
    Clever — I got frustrated with trying to unsort the uniques. – Matt Hall Aug 13 '21 at 12:57
  • The resulting array seems to be incorrect though. It should be `[0, 0, 0, 2, 2, 3, 15, 15, 7]`. Also, what if some value of the array changes and then shows up again e.g. `[2, 2, 2, 3, 3, 2, 2]`? – bb1 Aug 13 '21 at 13:37
  • Indeed this won't work with recurring values in `a`. I have fixed the error, the array of counts `c` also needs to be unsorted... – Ivan Aug 13 '21 at 13:42
  • Thanks for the quick reply and well-explained solution! – Alex Pharaon Aug 13 '21 at 14:04
  • @bb1, and OP - I have an alternative solution which will work with recurring values in `a`. – Ivan Aug 13 '21 at 14:18
  • For line `c = np.append(np.diff(x, prepend=0, append=x[-1] + 1), [0])` I am getting error `TypeError: 'int' object is not subscriptable`. Do you know what I am doing wrong? – Alex Pharaon Aug 13 '21 at 14:23
  • My bad, it should be `c = np.append(np.diff(i, prepend=0, append=i[-1] + 1), [0])` with variable `i`, not `x`. – Ivan Aug 13 '21 at 14:29
  • The solution does not seem to work for `a = cp.array([2, 2, 2, 3, 3, 15, 7, 7, 9, 7, 7, 8, 7, 7, 7])`, with the resulting array coming out as `[ 0 0 0 2 2 3 15 15 7 9 9 7 8]` rather than `[ 0 0 0 2 2 3 15 15 7 9 9 7 8 8 8]` – Alex Pharaon Aug 13 '21 at 15:13
  • Fixed it with c[-2] = int(len(a) - (c.sum()-1)) – Alex Pharaon Aug 13 '21 at 15:38
  • Glad you found a way around it, I was actually editing my answer, see above... – Ivan Aug 13 '21 at 16:00
1

You can try this code:

import numpy as np
a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9])
diff_a=np.diff(a)
idx=np.flatnonzero(diff_a)
val=diff_a[idx]
val=np.insert(val[:-1],0, a[0]) #update value
diff_a[idx]=val
res=np.append([0],np.cumsum(diff_a))
print(res)
Alex Alex
  • 1,893
  • 1
  • 6
  • 12
  • 1
    Although this was the fastest solution, it does not work when elements are repeated. For example, it does not work for array `a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9, 7, 7, 8, 7])` – Alex Pharaon Aug 13 '21 at 14:13
  • 1
    a is [ 2 2 2 3 3 15 7 7 9 7 7 8 7] result is [ 0 0 0 2 2 3 15 15 7 9 9 7 8] Where is my mistake? – Alex Alex Aug 13 '21 at 16:24
  • Ah yes, sorry, it seems when I updated your code to work in cupy it did not work for more complex examples like the one I gave above. Apparently cupy does not have `cp.insert()` so I had to find a work around for that. – Alex Pharaon Aug 13 '21 at 16:33
0

You can try this:

import numpy as np
a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9])

z = a - np.pad(a, (1,0))[:-1]
z[m] = np.pad(z[(m := z!=0)], (1,0))[:-1]
print(z.cumsum())

It gives:

[ 0  0  0  2  2  3 15 15  7]
bb1
  • 7,174
  • 2
  • 8
  • 23