-1

I have a dataframe with some columns that i have been adding myself. There is one specific column that gathers the max and min tide levels.

Pandas Column mostly empty but with some reference values

import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4],'b':[np.nan,np.nan,3,4]},columns=['a','b']) 
df

The problem is that the column is mostly empty because it only shows those peak values and not the intermediate ones. I would like to fill the missing values with a function similiar to the image shown below.

I want to fill it with a function of this kind

Thank you in advance.

  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Feb 03 '18 at 09:53
  • If you want to get a sufficient answer, you should be a bit more specific. Fitting data with a custom function is possible for instance with [scipy.optimize.curve_fit](https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.optimize.curve_fit.html). But what is your index data here? [A datetime column?](https://pandas.pydata.org/pandas-docs/stable/api.html#top-level-dealing-with-datetimelike) – Mr. T Feb 03 '18 at 10:31
  • @Piinthesky Between others I have a datetime column which registers the date every 15 minutes. The other column (the one I am interested to fill) is the column i explained previously. Thank you for your help – Borja Fernández Antelo Feb 03 '18 at 10:44
  • @Piinthesky I don't know how to add the code example and see the output – Borja Fernández Antelo Feb 03 '18 at 12:42

1 Answers1

0

Since you didn't specify, which datetime format your pandas dataframe uses, here is an example with index data. You can use them, if they are evenly spaced and they don't have gaps.

import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

tide = np.asarray([-1.2,np.nan,np.nan,3.4,np.nan,np.nan,-1.6,np.nan,np.nan,3.7,np.nan,np.nan,-1.4,])
tide_time = np.arange(len(tide))
df = pd.DataFrame({'a':tide_time,'b':tide}) 

#define your fit function with amplitude, frequence, phase and offset
def fit_func(x, ampl, freq, phase, offset):
    return ampl * np.sin(freq * x + phase) + offset

#extract rows that contain your values
df_nona = df.dropna()

#perform the least square fit, get the coefficients for your fitted data
coeff, _mat = curve_fit(fit_func, df_nona["a"], df_nona["b"])
print(coeff)

#append a column with fit data
df["fitted_b"] = fit_func(df["a"], *coeff)

Output for my sample data

#amplitude    frequency   phase       offset
[ 2.63098177  1.12805625 -2.17037976  1.0127173 ]

     a    b  fitted_b
0    0 -1.2 -1.159344
1    1  NaN -1.259341
2    2  NaN  1.238002
3    3  3.4  3.477807
4    4  NaN  2.899605
5    5  NaN  0.164376
6    6 -1.6 -1.601058
7    7  NaN -0.378513
8    8  NaN  2.434439
9    9  3.7  3.622127
10  10  NaN  1.826826
11  11  NaN -0.899136
12  12 -1.4 -1.439532
Mr. T
  • 11,960
  • 10
  • 32
  • 54