Interpolating backwards with multiple consecutive nan's in Pandas/Python?

Question

I have an array with missing values in various places.

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)

0    NaN
1    NaN
2    3.0
3    4.0
4    5.0
5    6.0
6    NaN
7    8.0
8    9.0
dtype: float64

For each NaN, I want to take the value proceeding it, an divide it by two. And then propogate that to the next consecutive NaN, so I would end up with:

0    0.75
1    1.5
2    3.0
3    4.0
4    5.0
5    6.0
6    4.0
7    8.0
8    9.0
dtype: float64

I've tried df.interpolate(), but that doesn't seem to work with consecutive NaN's.

even if `interpolate()` did work, it wouldn't do what you need. And by the way your "interpolation" rule seems quite weird. Are you sure that this is the way you want to do it? — Ma0, Aug 24 '16 at 11:37
@Ev.Kounis I'm not entirely sure this is the method I want, but right now I am just replicating what someone else has done with their data. Then I'll figure out a better way. In reality, I should be doing a curve-fitting exercise on the data to predict the missing values. — BobbyJohnsonOG, Aug 24 '16 at 12:18
what is typically done is that you assume the missing segment to be a straight line and based on the closest available points before and after the 'NaN' you calculate a value. This is what's called linear interpolation (see https://en.wikipedia.org/wiki/Linear_interpolation) — Ma0, Aug 24 '16 at 12:23
@Ev.Kounis Does linear interpolation work with consecutive NaN's, or NaN's that are at the beginning or end of a series? If so, why doesn't Pandas interpolation(method='linear', axis=1, limit_direction='both') work when I tried it before? It doesn't seem to touch NaN's at the beginning or end of my series. — BobbyJohnsonOG, Aug 24 '16 at 12:27
to do those you would have to **extra**polate. that is "guess" values that are **out**side a given range based on extending what comes next or came before. — Ma0, Aug 24 '16 at 12:37
@Ev.Kounis, thanks for that - found a good answer on extrapolating in Pandas here https://stackoverflow.com/questions/22491628/extrapolate-values-in-pandas-dataframe — BobbyJohnsonOG, Aug 24 '16 at 12:45

jezrael · Accepted Answer · 2016-08-24T12:19:23.653

Another solution with fillna with method ffill, what it same as ffill() function:

#back order of Series
b = df[::-1].isnull()
#find all consecutives NaN, count them, divide by 2 and replace 0 to 1
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})

print(a)
8    1
7    1
6    2
5    1
4    1
3    1
2    1
1    2
0    4
dtype: int32

print(df.bfill().div(a))
0    0.75
1    1.50
2    3.00
3    4.00
4    5.00
5    6.00
6    4.00
7    8.00
8    9.00
dtype: float64

Timings (len(df)=9k):

In [315]: %timeit (mat(df))
100 loops, best of 3: 11.3 ms per loop

In [316]: %timeit (jez(df1))
100 loops, best of 3: 2.52 ms per loop

Code for timings:

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()

def jez(df):
    b = df[::-1].isnull()
    a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
    return (df.bfill().div(a))

def mat(df):
    prev = 0
    new_list = []
    for i in df.values[::-1]:
        if np.isnan(i):
            new_list.append(prev/2.)    
            prev = prev / 2.
        else:
            new_list.append(i)
            prev = i
    return pd.Series(new_list[::-1])

print (mat(df))
print (jez(df1))

I love the way this works! Do you have a reason to have `mul(2)` and `div(a)`? Not wanting to have fractions? — Mathias711, Aug 24 '16 at 12:05

score 2 · Answer 2 · answered Aug 24 '16 at 11:37

You can do something like this:

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)

prev = 0
new_list = []
for i in df.values[::-1]:
    if np.isnan(i):
        new_list.append(prev/2.)    
        prev = prev / 2.
    else:
        new_list.append(i)
        prev = i
df = pd.Series(new_list[::-1])

It loops over the values of the df, in reverse. It keeps track of the previous value. It adds the actual value if it is not NaN, otherwise the half of the previous value.

This might not be the most sophisticated Pandas solution, but you can change the behavior quite easy.

Interpolating backwards with multiple consecutive nan's in Pandas/Python?

2 Answers2