Fill date gap with NaN and interpolate 5 min resolution data

Question

i'm using python and i need to fill the date gaps with NaN values, my data looks like this:

"Date & Time","High Temp - °C","Low Temp - °C"
"12/4/19 00:00","0.0","-0.1"
"12/4/19 00:05","-0.1","-0.1"
"12/4/19 00:10","0.1","-0.1"
"12/4/19 00:25","0.1","0.1"
"12/4/19 00:30","0.2","0.1"

and i would like to have them like this:

"Date & Time","High Temp - °C","Low Temp - °C"
"12/4/19 00:00","0.0","-0.1"
"12/4/19 00:05","-0.1","-0.1"
"12/4/19 00:10","0.1","-0.1"
"12/4/19 00:15","NaN","NaN"
"12/4/19 00:20","NaN","NaN"
"12/4/19 00:25","0.1","0.1"
"12/4/19 00:30","0.2","0.1"

after that i would like to interpolate the data to substitute the missing values.

what i have tried is:

#%%
from pathlib import Path
import pandas as pd

data=pd.read_csv(Path().joinpath('C:....d_data\\..._data.csv'))


data['Date & Time']=pd.to_datetime(data['Date & Time'],format='%m/%d/%Y %hh:%mm')
data = data.sort_values(by=['Date & Time'], ascending=[True])
data.set_index('Date & Time', inplace=True)
print (data)

What language are you writing this in? What have you tried? How many rows do you need? What do you mean by adding NaN and "after that" interpolating the values? — IMSoP, Aug 21 '20 at 09:02
i'm writing it in python. I need more than 20000 rows. i need to interpolate the missing values with 5 min resolution — Davide, Aug 21 '20 at 09:31
Then please [edit] your question to explain that, and to include the Python code you've written so far, and details of what the current result is. Note that this is not a code-writing service, so you need to make an attempt and ask for specific help. — IMSoP, Aug 21 '20 at 09:35

Grayrigel · Accepted Answer · 2020-08-21T12:39:03.323

0

You could resample the data as the following :

import pandas as pd

# reading the csv 
df = pd.read_csv('test.csv',parse_dates=['Date & Time'],index_col=0)

# resampling the data for every 5 min
df = df.resample('5T').mean()
print(df)

Output :

 Date & Time               High Temp - °C  Low Temp - °C                        
 2019-12-04 00:00:00             0.0           -0.1
 2019-12-04 00:05:00            -0.1           -0.1
 2019-12-04 00:10:00             0.1           -0.1
 2019-12-04 00:15:00             NaN            NaN
 2019-12-04 00:20:00             NaN            NaN
 2019-12-04 00:25:00             0.1            0.1
 2019-12-04 00:30:00             0.2            0.1

edited Aug 21 '20 at 12:39

answered Aug 21 '20 at 12:32

Grayrigel

3,474
5
14
32

Thanks, when i run the code i get this error: "pandas.core.base.DataError: No numeric types to aggregate" it seems that is not recognizing some inputs. When i open the file in Excel i have different date formats, example: 16.00 and 04.00PM – Davide Aug 21 '20 at 13:48
For me, it works fine. May be you could add `df = df.astype('float')` above `df = df.resample('5T').mean()`. You could use the following the change the datetime format in `pandas` (https://stackoverflow.com/questions/38067704/how-to-change-the-datetime-format-in-pandas) – Grayrigel Aug 21 '20 at 14:55
Thanks Vikas, still not working. The problem is that i can convert some dates into numbers but not all of them. So i have this error: "ValueError: could not convert string to float: --" – Davide Aug 24 '20 at 10:53
Can you share/link of a screenshot of `print(df.info())`. – Grayrigel Aug 24 '20 at 12:02
The problem is related to the data format. How can i change the data format? in the original data is dd/mm/yyyy hh/mm. But when i resample it reads it like mm/dd/yyyy hh/mm thus creating a wrong output. – Davide Aug 24 '20 at 13:46
Maybe not. I think it is pandas way to read datetime. if the datatype is `datetime64[ns]`, then it shouldn't be a problem. My concern is that some of your data in not being read numeric(int or float) and becoming string. That's why I asked `print(df.info())`, which will tell us the datatype of each column. – Grayrigel Aug 24 '20 at 19:39

Fill date gap with NaN and interpolate 5 min resolution data

1 Answers1