I have a really long list (10 years) of hourly values and I would like to average column 3, per day. Such that each date will have an average value derived from 3rd column.
My data loooks like this:
> 1/1/2005,16:00:00,83.3971,-3.8950
> 1/1/2005,17:00:00,0.0000,-3.9146
> 1/1/2005,18:00:00,0.0000,-3.9337
> 1/1/2005,19:00:00,0.0000,-3.9532
> 1/1/2005,20:00:00,0.0000,-3.9727
> 1/1/2005,21:00:00,0.0000,-3.9920
> 1/1/2005,22:00:00,0.0000,-4.0116
> 1/1/2005,23:00:00,0.0000,-4.0311
> 1/2/2005,0:00:00,0.0000,-4.0503
> 1/2/2005,1:00:00,0.0000,-4.0697
> 1/2/2005,2:00:00,0.0000,-4.0891
> 1/2/2005,3:00:00,0.0000,-4.1083
> 1/2/2005,4:00:00,0.0000,-4.1279
> 1/2/2005,5:00:00,0.0000,-4.1472
> 1/2/2005,6:00:00,0.0000,-4.1662
> 1/2/2005,7:00:00,0.0000,-4.1858
> 1/2/2005,8:00:00,0.0000,-4.2053
> 1/2/2005,9:00:00,152.7058,-4.2242
> 1/2/2005,10:00:00,302.6400,-4.2436
> 1/2/2005,11:00:00,405.2218,-4.2630
> 1/2/2005,12:00:00,452.6208,-4.2821
> 1/2/2005,13:00:00,441.4662,-4.3016
> 1/2/2005,14:00:00,372.5459,-4.3208
> 1/2/2005,15:00:00,250.8291,-4.3398
> 1/2/2005,16:00:00,86.6172,-4.3592
> 1/2/2005,17:00:00,0.0000,-4.3785
> 1/2/2005,18:00:00,0.0000,-4.3973
> 1/2/2005,19:00:00,0.0000,-4.4167
>...
12/30/2014,23:00:00,0.0000,0.7601 12/31/2014,0:00:00,0.0000,0.7601 12/31/2014,1:00:00,0.0000,0.7601 12/31/2014,2:00:00,0.0000,0.7601 12/31/2014,3:00:00,0.0000,0.7601 12/31/2014,4:00:00,0.0000,0.7601 12/31/2014,5:00:00,0.0000,0.7601 12/31/2014,6:00:00,0.0000,0.7601 12/31/2014,7:00:00,0.0000,0.7601 12/31/2014,8:00:00,0.0000,2.6808 12/31/2014,9:00:00,153.8084,1.6338 12/31/2014,10:00:00,301.9711,1.3491 12/31/2014,11:00:00,402.5888,1.2512 12/31/2014,12:00:00,447.9860,1.2191 12/31/2014,13:00:00,434.9283,1.2277
...
This may be an excellent opportunity to highlight the "Split, Apply, Combine" premise and with a simple case use?
Perhaps Read csv into pandas, index as a datetime object, then groupby day, aggregate sum/divide by count (aka average)?
QUESTION: I need the average daily value and I am starting with the above 10-year, hourly time series. As in, I have an hourly dataset going from Jan 1 2005 to Dec 31 2014, and I want the average daily value based on the 10 years of daily averages from that dataset. You dig?
I have already gone from hourly to daily using:
df = pd.read_csv('file.csv', parse_dates='datetime':0,1]},index_col='datetime', header=True, usecols=[0,1,2])
day_avgs = df.groupby(pd.TimeGrouper('D'))
This returns average daily values, indeed, see below:
date
2005-01-01 106.307291
2005-01-02 102.578729
2005-01-03 103.332883
2005-01-04 104.139979
2005-01-05 104.999592
... ...
2014-12-02 108.292092
2014-12-03 107.189729
2014-12-04 106.142721
2014-12-05 105.151696
However, I am stumped as to how I can group these daily values in "day_avgs", so group on each date (10 of them) and then average to give one daily average that is the average of all of those individual dates over the 10 year dataset. Capiche?
ie, I would like to have the average of every day (365) in a year, based on the 10 years of daily averages.