0

Here are the steps I've taken so far. I am trying to get the daily PM averages of my dataframe, with a column of values 'PM'.

import pandas as pd
import numpy as np
df_2018 = pd.read_csv('kath2018.csv')

My 'kath2018.csv' looks like this:

df_2018.head()

    Date    Year    Month   Day Hour    PM
0   1/1/18 1:00 2018    1   1   1   131
1   1/1/18 2:00 2018    1   1   2   85
2   1/1/18 3:00 2018    1   1   3   74
3   1/1/18 4:00 2018    1   1   4   79
4   1/1/18 5:00 2018    1   1   5   85

I cleanup the data by replacing missing null values with np.NaN, and then using pd.interpolate to replace the NaN's.

#data has random -999 and 985 values, replace with NaN
df_2018['PM']=df_2018['PM'].replace(-999, np.NaN)
df_2018['PM']=df_2018['PM'].replace(985, np.NaN)
df_2018['PM'] = df_2018['PM'].interpolate()

Then, in order to get the daily average (my data is given in hourly intervals), I run the following code, which does exactly what it is supposed to, groups the hourly value by day and gives the average.

df_2018['Date'] = pd.to_datetime(df_2018['Date'])
df_2018 = df_2018.groupby(pd.Grouper(freq='D', key='Date')).mean()

However, there are entirely missing days worth of data, for when i look at df_2018 now, the days that were completely missing look like current dataframe after groupby

I cannot figure out how to go back into the dataframe, and replace the empty cell under the PM column with an np.NaN in order to do the interpolation again.

Should I be 'going back', is there a way for me to scope out the missing days first before running the interpolation and groupby function?

Scott
  • 21
  • 1
  • 6
  • Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 21 '21 at 22:27
  • Specifically, the contents of `'kath2018.csv'` are unknown so it's difficult to determine where the issue may be. – Henry Ecker May 21 '21 at 22:28
  • @HenryEcker I have added a small subset of data using the .head() function. Is this adequate? – Scott May 24 '21 at 18:10

0 Answers0