0

I have a dataframe of 20 years for several stations. I would like to Interpolate the values of the observations for each station. I used the following line, but it does not work.

df.groupby('stations')['observations'].interpolate(method='linear')

and here is a sample of my data

station  year    observations
0   3939    2000    0.346518
1   3939    2001    0.278250
2   3939    2002    1.096147
3   3939    2003    0.423948
4   3939    2004    0.000000
5   3939    2005    0.000000
6   3939    2006    0.000000
7   3939    2007    0.663922
8   3939    2008    0.000000
9   3939    2009    0.000000
10  3939    2010    0.000000
11  3939    2011    2.921322
12  3939    2012    1.463399
13  3939    2013    1.402697
14  3939    2014    0.000000
15  3939    2015    0.000000
16  3939    2016    0.000000
17  3939    2017    0.000000
18  3939    2018    0.000000
19  3939    2019    2.599236
20  3939    2020    1.428136
21  3953    2000    5.893202
22  3953    2001    7.227092
23  3953    2002    6.489147
24  3953    2003    4.961213
25  3953    2004    0.000000
26  3953    2005    0.000000
27  3953    2006    5.273121
28  3953    2007    0.000000
29  3953    2008    0.000000
30  3953    2009    0.000000
31  3953    2010    5.591221
32  3953    2011    0.000000
33  3953    2012    0.000000
34  3953    2013    4.797106
35  3953    2014    8.109661
36  3953    2015    0.000000
37  3953    2016    1.798583
38  3953    2017    0.000000
39  3953    2018    0.000000
40  3953    2019    0.000000
41  3953    2020    6.440142
42  3977    2000    14.236954
43  3977    2001    17.216910
44  3977    2002    10.210559
45  3977    2003    0.000000
46  3977    2004    0.000000
47  3977    2005    10.463710
48  3977    2006    0.000000
49  3977    2007    0.000000

Thanks

GeoBeez
  • 920
  • 2
  • 12
  • 20

2 Answers2

0

Thanks to Henry Ecker, he answered my question in the comment of my previous post.

df['observations'] = (
    df['observations']
        .mask(df['observations'].eq(0))  # Replace 0 with NaN
        .groupby(df['station'])  # Groupby Station
        .transform(pd.Series.interpolate, method='linear')  # interpolate
)

He also suggested this post too for more information.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
GeoBeez
  • 920
  • 2
  • 12
  • 20
0

You could alternatively write a lambda function with .apply()

interp = lambda g: g.replace(0, np.nan).interpolate(method='linear')
df.groupby('station')['observations'].apply(interp)
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51