1

I know that this question has been asked before, but the suggested solutions that I found to not work for me. Maybe I am trying to do something that is simply not possible, but let me explain.

I have a time-series data that has some values of 0. I would like to interpolate the zeros in data using pandas.DataFrame.interpolate.

The code:

import pandas as pd
import numpy as np

data = [0, -1.31527, -2.25448, -0.965348, -1.11168, -0.0506046, -0.605522,
        2.01337, 0, 0, 2.41931, 0.821425, 0.402411, 0]

df = pd.DataFrame(data=data) # Data to pandas dataframe
df.replace(to_replace=0, value=np.nan, inplace=True) # Replace 0 by nan
ip = df.interpolate(method="nearest", order=3, limit=None,
                    limit_direction=None)
print(ip)

The result of print(ip):

           0
0        NaN
1  -1.315270
2  -2.254480
3  -0.965348
4  -1.111680
5  -0.050605
6  -0.605522
7   2.013370
8   2.013370
9   2.419310
10  2.419310
11  0.821425
12  0.402411
13       NaN

The problem: Pandas does not interpolate the first and last value of data, but leaves them as zeros. I tried all options of pandas.DataFrame.interpolate out forward and back, but it does not seem to work interpolating the first and last zero of data. Is this simply impossible via Pandas or am I doing something wrong?

Philipp
  • 335
  • 2
  • 4
  • 12
  • 1
    You need further information about the "shape" of the values. Furthermore, there is no actual function of pandas that implements it quickly. Here is a more sophisticated answer of a similar post: https://stackoverflow.com/a/35959909/16872314 – Marcello Zago Feb 22 '23 at 12:09
  • what you want is an extrapolation, what would be the explicit expected output? – mozway Feb 22 '23 at 12:26
  • Thanks. I just read throuth the link provided by Marcello Zago. There was a mistake in my script, since the ```method="nearest"``` should be ```method="polynomial```. The original time-series is much longer than my example provided here. To answer your question: my aim is to not just apply the cubic polynomial for interpolation, but for extrapolation as well. – Philipp Feb 22 '23 at 12:31

1 Answers1

1

What you want is an extrapolation, you need to decide on how to do this.

You can ffill/bfill:

ip = (df.interpolate(method="nearest", order=3, limit=None,
                     limit_direction='both')
        .ffill().bfill()
     )

Output:

           0
0  -1.315270
1  -1.315270
2  -2.254480
3  -0.965348
4  -1.111680
5  -0.050605
6  -0.605522
7   2.013370
8   2.013370
9   2.419310
10  2.419310
11  0.821425
12  0.402411
13  0.402411

enter image description here

Or use a spline:

ip = (df.interpolate(method="nearest", order=3, limit=None,
                     limit_direction=None)
        .fillna(
      df.interpolate(method="spline", order=3, limit=None,
                     limit_direction='both')
        )
     )

Output:

           0
0  -0.585237
1  -1.315270
2  -2.254480
3  -0.965348
4  -1.111680
5  -0.050605
6  -0.605522
7   2.013370
8   2.013370
9   2.419310
10  2.419310
11  0.821425
12  0.402411
13 -1.951716

Output:

enter image description here

mozway
  • 194,879
  • 13
  • 39
  • 75