1

I want to replace the N first identic consecutive numbers from an array with 0.

import numpy as np


x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

Loop works, but what would be a faster-vectorized implementation?

i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
    x[i] = 0
    i += 1
Mathieu
  • 5,410
  • 6
  • 28
  • 55
  • 1
    maybe a duplicate: https://stackoverflow.com/questions/7352684/how-to-find-the-groups-of-consecutive-elements-in-a-numpy-array – Glauco Feb 09 '22 at 10:46
  • @Glauco I gave it a try, I don't see the correct way to use this element-wise subtraction, or `np.diff` or `np.gradient`. – Mathieu Feb 09 '22 at 10:49

2 Answers2

1

You can use argmax on a boolean array to get the index of the first changing value.

Then slice and replace:

n = (x!=x[0]).argmax()  # 4
x[:n] = 0

output:

array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

intermediate array:

(x!=x[0])

#                            n=4
# [False False False False  True  True  True  True  True  True  True  True
#  True  True  True  True  True  True  True]
mozway
  • 194,879
  • 13
  • 39
  • 75
0

My solution is based on itertools.groupby, so start from import itertools.

This function creates groups of consecutive equal values, contrary to e.g. the pandasonic version of groupby, which collects withis a single group all equal values from the input.

Another important feature is that you can assign any value to N and replaced will be only the first N of a sequence of consecutive values.

To test my code, I set N = 4 and defined the source array as:

x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])

Note that it contains 5 consecutive values of 2 at the end.

Then, to get the expected result, run:

rv = []
for key, grp in itertools.groupby(x):
    lst = list(grp)
    lgth = len(lst)
    if lgth >= N:
        lst[0:N] = [0] * N
    rv.extend(lst)
xNew = np.array(rv)

The result is:

[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]

Note that a sequence of 4 zeroes occurs:

  • at the beginning (all 4 values of 1 have been replaced),
  • almost at the end (from 5 values of 2 first 4 have been replaced).
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41