4

I have a 1D numpy numpy array with integers, where I want to replace zeros with the previous non-zero value if and only if the next non-zero value is the same.

For example, an array of:

in: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,3,1,0,1])
out: [1,0,1,1,0,0,2,0,3,0,0,0,3,1,0,1]

should become

out: [1,1,1,1,0,0,2,0,3,3,3,3,3,1,1,1]

Is there a vectorized way to do this? I found some way to fill values of zeros here, but not how to do it with exceptions, i.e. to not fill the zeros that are within integers with different value.

Bram Zijlstra
  • 390
  • 1
  • 5
  • 13

1 Answers1

4

Here's a vectorized approach taking inspiration from NumPy based forward-filling for the forward-filling part in this solution alongwith masking and slicing -

def forward_fill_ifsame(x):
    # Get mask of non-zeros and then use it to forward-filled indices
    mask = x!=0
    idx = np.where(mask,np.arange(len(x)),0)
    np.maximum.accumulate(idx,axis=0, out=idx)

    # Now we need to work on the additional requirement of filling only
    # if the previous and next ones being same
    # Store a copy as we need to work and change input data
    x1 = x.copy()

    # Get non-zero elements
    xm = x1[mask]

    # Off the selected elements, we need to assign zeros to the previous places
    # that don't have their correspnding next ones different
    xm[:-1][xm[1:] != xm[:-1]] = 0

    # Assign the valid ones to x1. Invalid ones become zero.
    x1[mask] = xm

    # Use idx for indexing to do the forward filling
    out = x1[idx]

    # For the invalid ones, keep the previous masked elements
    out[mask] = x[mask]
    return out

Sample runs -

In [289]: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,3,1,0,1])

In [290]: np.vstack((x, forward_fill_ifsame(x)))
Out[290]: 
array([[1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 3, 1, 0, 1],
       [1, 1, 1, 1, 0, 0, 2, 0, 3, 3, 3, 3, 3, 1, 1, 1]])

In [291]: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,1,1,0,1])

In [292]: np.vstack((x, forward_fill_ifsame(x)))
Out[292]: 
array([[1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 0, 1],
       [1, 1, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 1, 1]])

In [293]: x = np.array([1,0,1,1,0,0,2,0,3,0,0,0,1,1,0,2])

In [294]: np.vstack((x, forward_fill_ifsame(x)))
Out[294]: 
array([[1, 0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 0, 2],
       [1, 1, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 0, 2]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    @Divakar When I studied *Tensor decompositions*, one of the tasks was to analyse what terms people commonly used to *hyperlink* the pages. It was done on a real dataset. And the analysis came out to be not so good. Because, almost always people used terms like "see here", "in this post", "another blog", "at this link" etc., which were not so interesting; So, when hyperlinking, it'd be a good idea to use the question topic instead. viz. **"Most efficient way to forward-fill NaN values"** which should make the link title informative :) and a bit more nice as well – kmario23 Jan 14 '18 at 17:43
  • 1
    @Divakar, you always inspire me with your ability to comprehend seemingly complex requirements and finding straight forward, probably most efficient solutions – Siraj S. Jan 15 '18 at 14:52