0

i'm using python and i need to fill the date gaps with NaN values, my data looks like this:

"Date & Time","High Temp - °C","Low Temp - °C"
"12/4/19 00:00","0.0","-0.1"
"12/4/19 00:05","-0.1","-0.1"
"12/4/19 00:10","0.1","-0.1"
"12/4/19 00:25","0.1","0.1"
"12/4/19 00:30","0.2","0.1"

and i would like to have them like this:

"Date & Time","High Temp - °C","Low Temp - °C"
"12/4/19 00:00","0.0","-0.1"
"12/4/19 00:05","-0.1","-0.1"
"12/4/19 00:10","0.1","-0.1"
"12/4/19 00:15","NaN","NaN"
"12/4/19 00:20","NaN","NaN"
"12/4/19 00:25","0.1","0.1"
"12/4/19 00:30","0.2","0.1"

after that i would like to interpolate the data to substitute the missing values.

what i have tried is:

#%%
from pathlib import Path
import pandas as pd

data=pd.read_csv(Path().joinpath('C:....d_data\\..._data.csv'))


data['Date & Time']=pd.to_datetime(data['Date & Time'],format='%m/%d/%Y %hh:%mm')
data = data.sort_values(by=['Date & Time'], ascending=[True])
data.set_index('Date & Time', inplace=True)
print (data)
IMSoP
  • 89,526
  • 13
  • 117
  • 169
Davide
  • 3
  • 2
  • What language are you writing this in? What have you tried? How many rows do you need? What do you mean by adding NaN and "after that" interpolating the values? – IMSoP Aug 21 '20 at 09:02
  • i'm writing it in python. I need more than 20000 rows. i need to interpolate the missing values with 5 min resolution – Davide Aug 21 '20 at 09:31
  • Then please [edit] your question to explain that, and to include the Python code you've written so far, and details of what the current result is. Note that this is not a code-writing service, so you need to make an attempt and ask for specific help. – IMSoP Aug 21 '20 at 09:35

1 Answers1

0

You could resample the data as the following :

import pandas as pd

# reading the csv 
df = pd.read_csv('test.csv',parse_dates=['Date & Time'],index_col=0)

# resampling the data for every 5 min
df = df.resample('5T').mean()
print(df)

Output :

 Date & Time               High Temp - °C  Low Temp - °C                        
 2019-12-04 00:00:00             0.0           -0.1
 2019-12-04 00:05:00            -0.1           -0.1
 2019-12-04 00:10:00             0.1           -0.1
 2019-12-04 00:15:00             NaN            NaN
 2019-12-04 00:20:00             NaN            NaN
 2019-12-04 00:25:00             0.1            0.1
 2019-12-04 00:30:00             0.2            0.1
Grayrigel
  • 3,474
  • 5
  • 14
  • 32
  • Thanks, when i run the code i get this error: "pandas.core.base.DataError: No numeric types to aggregate" it seems that is not recognizing some inputs. When i open the file in Excel i have different date formats, example: 16.00 and 04.00PM – Davide Aug 21 '20 at 13:48
  • For me, it works fine. May be you could add `df = df.astype('float')` above `df = df.resample('5T').mean()`. You could use the following the change the datetime format in `pandas` (https://stackoverflow.com/questions/38067704/how-to-change-the-datetime-format-in-pandas) – Grayrigel Aug 21 '20 at 14:55
  • Thanks Vikas, still not working. The problem is that i can convert some dates into numbers but not all of them. So i have this error: "ValueError: could not convert string to float: --" – Davide Aug 24 '20 at 10:53
  • Can you share/link of a screenshot of `print(df.info())`. – Grayrigel Aug 24 '20 at 12:02
  • The problem is related to the data format. How can i change the data format? in the original data is dd/mm/yyyy hh/mm. But when i resample it reads it like mm/dd/yyyy hh/mm thus creating a wrong output. – Davide Aug 24 '20 at 13:46
  • Maybe not. I think it is pandas way to read datetime. if the datatype is `datetime64[ns]`, then it shouldn't be a problem. My concern is that some of your data in not being read numeric(int or float) and becoming string. That's why I asked `print(df.info())`, which will tell us the datatype of each column. – Grayrigel Aug 24 '20 at 19:39