replace zeros in numpy array with linear interpolation between its preceding and succeeding values

Question

assuming that we have an array a = np.array([1,2,0,4,0,5,0,0,11]) ,how can we get:

array([ 1,  2,  3,  4,  4.5,  5,  7,  9, 11])

What I have tried is:

from scipy.interpolate import interp1d

a = np.array([1,2,0,4,0,5,0,0,11])
b = a[np.nonzero(a)]
brange = np.arange(b.shape[0])
interp = interp1d(brange, b)

This seems to do the actual job of finding in-between values. For instance:

print (interp(1), interp(1.5), interp(2), interp(2.5), interp(3))
#out: 2.0 3.0 4.0 4.5 5.0

But I can't figure out how to re-construct my original array from interp. I also tried the solution to this question, but I had the exact same problem with that solution as well.

UPDATE:

I did a quick benchmark for both solution using numpy and pandas, here is the result:

y = np.array([1,2,0,4,0,5,0,0,11])

def test1(y):

    x = np.arange(len(y))
    idx = np.nonzero(y)
    interp = interp1d(x[idx],y[idx])

    return interp(x)

def test2(y):
    s = pd.Series(y)
    s.interpolate(inplace=True)
    return s.values

%timeit t1 = test1(y)
%timeit t2 = test2(y)

139 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
158 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

About 12% faster. Not as good as I hoped, but since the code is going to be run several million times, it probably worth the effort.

Thomas Kühn · Answer 1 · 2017-08-21T11:14:01.147

You need to feed interp1d a y-array without the zeros and an x-array that skips said zeros. Then, for the interpolation, you have to give the interpolation function an x-array that holds all the original x-values plus the ones at which you want your interpolated values to occur. In your case, as you have a ready, equally spaced vector, you can just use np.arange to produce the x-values and np.where for filtering out the zeros.

Here an example code:

import numpy as np
from scipy.interpolate import interp1d

y = np.array([1,2,0,4,0,5,0,0,11])
xnew = np.arange(len(y))

zero_idx = np.where(y==0)
xold = np.delete(xnew,zero_idx)
yold = np.delete(y, zero_idx)

print('before')
print(xold)
print(yold)

f = interp1d(xold,yold)

ynew = f(xnew)

print()
print('after')
print(xnew)
print(ynew)

The result looks like this:

before
[0 1 3 5 8]
[ 1  2  4  5 11]

after
[0 1 2 3 4 5 6 7 8]
[  1.    2.    3.    4.    4.5   5.    7.    9.   11. ]

EDIT:

Actually you don't need the np.delete, you can just use slicing:

y = np.array([1,2,0,4,0,5,0,0,11])
x = np.arange(len(y))
idx = np.where(y!=0)        #or np.nonzero(y) -- thanks DanielF
f = interp1d(x[idx],y[idx])
ynew = f(x)

or even `idx = np.nonzero(y)` – Daniel F Aug 21 '17 at 11:04 — Daniel F, Aug 21 '17 at 11:04

score 4 · Answer 2 · answered Aug 21 '17 at 10:18

4

you can use pandas interpolate function:

import pandas as pd
import numpy as np

a = pd.Series([1,2,0,4,0,5,0,0,11])

a.replace(0, np.NaN, inplace=True)

a.interpolate()

0     1.0
1     2.0
2     3.0
3     4.0
4     4.5
5     5.0
6     7.0
7     9.0
8    11.0

Also : a.interpolate().values will give you array of values.

# output: array([  1. ,   2. ,   3. ,   4. ,   4.5,   5. ,   7. ,   9. ,  11. ])

Also : interpolate takes inplace as an argument which you can use

answered Aug 21 '17 at 10:18

Vikash Singh

13,213
8
40
70

Thank's for your answer. but I need a numpy solution without overhead of converting to `Series` if possible. – Alz Aug 21 '17 at 10:26
@AlirezaSohofi is `interp(5)` giving you the output you expect? – Vikash Singh Aug 21 '17 at 10:46
No, and that was actually the problem I had. check out @Daniel's solution and my update. – Alz Aug 21 '17 at 15:33
@AlirezaSohofi Thanks for the update. I still found the pandas interpolation much simpler. still thanks :) – Vikash Singh Aug 21 '17 at 15:57
I agree with you. I would never bother to do it in numpy way for one-off jobs, but for production code I think performance should be given priority. – Alz Aug 21 '17 at 16:01

Daniel F · Accepted Answer · 2017-08-21T11:27:41.520

I think your implementation is a bit off. What you want is something closer to what @Thomas came up with:

y = np.array([1,2,0,4,0,5,0,0,11])
idx = np.nonzero(y)
interp = interp1d(x[idx],y[idx])

x = np.arange(len(y))
ynew = interp(x)

If you want to re-construct your original array from interp, you just need to use the .x and .y parameters.

a_ = np.zeros(interp.x[-1] + 1)
a_[interp.x] = interp.y

Of course, this will remove any trailing zeros from the original a, as a.size is not preserved in the interpolation. If you have preserved them elsewhere (such as ynew.shape), you can instead initialize a_ = np.zeros_like(ynew)

replace zeros in numpy array with linear interpolation between its preceding and succeeding values

3 Answers3