6

I have a sequence of one month of data and I wanted the average value at 00:00, 00:30, 01:00, ...23:30 for the whole month. If this were in an hourly basis I could simply do

df.groupby(df.index.hour).mean()

but I have no idea how to do this for a 30 min period. I have tried random things like

df.groupby(df.index.hour*df.index.minute).mean()

but none of them has worked. Can this be done in Pandas?

Thank you.

EDIT Sample data:

2015-06-01 00:00:00     4.474450  137.007017
2015-06-01 00:30:00     5.661688  138.342549
2015-06-01 01:00:00     6.142984  139.469381
2015-06-01 01:30:00     6.245277  140.780341
2015-06-01 02:00:00     6.368909  141.464176
2015-06-01 02:30:00     6.535648  143.121590
...                          ...         ...
2015-06-04 21:30:00     6.380301  123.523559
2015-06-04 22:00:00     6.118872  124.649216
2015-06-04 22:30:00     6.554864  127.671638
2015-06-04 23:00:00     7.628708  129.960442
2015-06-04 23:30:00     8.082754  132.294248
2015-06-04 00:00:00     7.768733  132.960135

Note that the the data spans more than a day, but I am looking for the result in as a 24*2 length array. Since the 00:30 data will be the average of the data for all the days in that time and so on.

TomCho
  • 3,204
  • 6
  • 32
  • 83
  • 1
    It would help if you could post few rows of sample data too. – Zero Jun 01 '15 at 17:54
  • Well, I can't post my month data, It's too much data. If I were to make a mwe I'd just copy the panda's tutorial random arange, so I don't really see the point. I'll edit in a while with a few rows anyway. – TomCho Jun 01 '15 at 18:20
  • You want to [`resample`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html#pandas.DataFrame.resample) so something like `df.resample('30min')` should work – EdChum Jun 01 '15 at 18:29
  • @EdChum It's not resample. I want an average of all the cases where time is 00:30, or 01:00, or 01:30 for different days. Get it? – TomCho Jun 01 '15 at 18:38
  • So you want the average values for a given time at 30 min intervals? It looks like your date already has values for every 30 min intervals is this correct? You could do `df.groupby([df.index.hour, df.index.minute]).mean()` this will group on the 30 min intervals that are common across all days – EdChum Jun 01 '15 at 18:42
  • Yes, kind of like that. I still have to see if this works but it looks like it will. Unfortunately there's no official name for the kind of average I mean which makes it hard to explain, but sometimes it's referred to as multi-daily-mean with a 30min frequency. – TomCho Jun 01 '15 at 18:45
  • Can the person who gave the -1 please explain the reason? I honestly do not know why and I'd like to edit my question accordingly. – TomCho Jun 01 '15 at 19:17

3 Answers3

7

IIUC I think you want the following:

In [13]:
# load the data
t="""2015-06-01 00:00:00,4.474450,137.007017
2015-06-01 00:30:00,5.661688,138.342549
2015-06-01 01:00:00,6.142984,139.469381
2015-06-01 01:30:00,6.245277,140.780341
2015-06-01 02:00:00,6.368909,141.464176
2015-06-01 02:30:00,6.535648,143.121590
2015-06-04 21:30:00,6.380301,123.523559
2015-06-04 22:00:00,6.118872,124.649216
2015-06-04 22:30:00,6.554864,127.671638
2015-06-04 23:00:00,7.628708,129.960442
2015-06-04 23:30:00,8.082754,132.294248
2015-06-04 00:00:00,7.768733,132.960135"""
df = pd.read_csv(io.StringIO(t), index_col=[0], parse_dates=[0], header=None)
df.columns = ['x','y']

In [14]:
# group on the hour and minute attribute of the index
df.groupby([df.index.hour, df.index.minute]).mean()
Out[14]:
              x           y
0  0   6.121592  134.983576
   30  5.661688  138.342549
1  0   6.142984  139.469381
   30  6.245277  140.780341
2  0   6.368909  141.464176
   30  6.535648  143.121590
21 30  6.380301  123.523559
22 0   6.118872  124.649216
   30  6.554864  127.671638
23 0   7.628708  129.960442
   30  8.082754  132.294248

So the above will group on the index hour and minute attribute of your index, so this gives you the mean for all days that have a value at 00:30, 01:00 etc..

EdChum
  • 376,765
  • 198
  • 813
  • 562
2

Assuming you want the average value for each 30 min. period averaged over the month, try using resample (in case your data is not already evenly spaced at 30 minute intervals), and then use groupby.

df = pd.DataFrame(np.random.randn(5000), 
                  columns=['vals'],
                  index=pd.date_range(start=dt.datetime.now(), 
                  periods=5000,
                  freq='30T'))

df = df.resample('30Min')
>>> df.groupby(lambda x: (x.year, x.month, x.hour, x.minute)).vals.mean()
(2015, 6, 0, 0)    -0.120642
(2015, 6, 0, 30)    0.172788
(2015, 6, 1, 0)     0.310861
(2015, 6, 1, 30)   -0.054615
(2015, 6, 2, 0)    -0.122372
(2015, 6, 2, 30)   -0.160935
(2015, 6, 3, 0)     0.290064
(2015, 6, 3, 30)    0.040233
(2015, 6, 4, 0)    -0.267994
(2015, 6, 4, 30)    0.032256
(2015, 6, 5, 0)    -0.240584
(2015, 6, 5, 30)   -0.095077
(2015, 6, 6, 0)    -0.145298
(2015, 6, 6, 30)    0.311680
(2015, 6, 7, 0)    -0.259130
...
(2015, 9, 16, 30)   -0.249618
(2015, 9, 17, 0)     0.000566
(2015, 9, 17, 30)    0.085121
(2015, 9, 18, 0)    -0.008067
(2015, 9, 18, 30)   -0.392995
(2015, 9, 19, 0)    -0.509947
(2015, 9, 19, 30)    0.117550
(2015, 9, 20, 0)     0.076988
(2015, 9, 20, 30)   -0.096187
(2015, 9, 21, 0)    -0.066262
(2015, 9, 21, 30)   -0.274175
(2015, 9, 22, 0)    -0.459320
(2015, 9, 22, 30)    0.685940
(2015, 9, 23, 0)    -0.050148
(2015, 9, 23, 30)    0.038874
Name: a, Length: 192, dtype: float64

And to view the monthly averages:

df2 = df.reset_index()
df2['year'] = [c[0] for c in df2['index']]
df2['month'] = [c[1] for c in df2['index']]
df2['hour'] = [c[2] for c in df2['index']]
df2['minutes'] = [c[3] for c in df2['index']]

>>> df2.pivot_table(values='vals', 
                    columns=['year', 'month'], 
                    index=['hour', 'minutes'], 
                    aggfunc=np.mean)

year              2015                              
month                6         7         8         9
hour minutes                                        
0    0       -0.120642  0.260686  0.320550  0.374241
     30       0.172788 -0.078378  0.092206  0.151341
1    0        0.310861 -0.210523 -0.005879 -0.162668
     30      -0.054615  0.069194 -0.026174  0.218007
2    0       -0.122372  0.036491  0.266133  0.050847
     30      -0.160935  0.191182  0.205710  0.183733
3    0        0.290064  0.062863  0.042098 -0.167724
     30       0.040233 -0.083346  0.248039  0.654488
4    0       -0.267994 -0.304616 -0.227858 -0.306729
     30       0.032256  0.036278 -0.350544  0.111284
5    0       -0.240584  0.177614  0.174180 -0.156598
     30      -0.095077  0.350684  0.430140  0.050188
6    0       -0.145298  0.260356  0.314880 -0.367434
     30       0.311680 -0.307146 -0.024851 -0.012917
7    0       -0.259130 -0.030620  0.027398 -0.050143
     30       0.283149 -0.465681  0.067154 -0.118537
8    0        0.108188 -0.034551  0.206411 -0.325447
     30      -0.069086 -0.074594 -0.081681  0.087789
9    0       -0.115867  0.257696 -0.056953 -0.000636
     30      -0.194631 -0.018209  0.097634  0.321195
10   0       -0.029710 -0.179173 -0.362098 -0.425820
     30       0.171463 -0.275286  0.124837  0.185941
11   0        0.027725  0.116209  0.397818  0.273722
     30       0.045747  0.113604  0.053537  0.130483
12   0        0.397945  0.106375  0.316335  0.487824
     30      -0.133603  0.352268  0.043338 -0.080617
13   0       -0.152457  0.005833 -0.024060 -0.484102
     30       0.023435 -0.243851 -0.190029 -0.155168
14   0       -0.029532  0.020272  0.299358 -0.158454
     30       0.250930 -0.157656  0.007717 -0.088050
15   0       -0.098546  0.282827 -0.185139 -0.119801
     30      -0.145674 -0.047190 -0.078103 -0.116217
16   0        0.164972  0.190326  0.156651 -0.559833
     30      -0.034718 -0.273184 -0.254462 -0.249618
17   0        0.133240  0.071170 -0.200580  0.000566
     30      -0.030369  0.007821 -0.298061  0.085121
18   0        0.184950  0.013328  0.196898 -0.008067
     30       0.049239 -0.050993  0.008094 -0.392995
19   0       -0.067991 -0.011393 -0.101014 -0.509947
     30      -0.219792  0.098113 -0.297009  0.117550
20   0        0.174875 -0.253166 -0.130623  0.076988
     30      -0.407662 -0.221100  0.172923 -0.096187
21   0        0.041020 -0.381691 -0.090805 -0.066262
     30      -0.163835 -0.158566 -0.466063 -0.274175
22   0       -0.039960  0.400497  0.028426 -0.459320
     30       0.023610 -0.097154 -0.010363  0.685940
23   0       -0.261549  0.010280  0.019144 -0.050148
     30      -0.008354 -0.451011 -0.012453  0.038874
Alexander
  • 105,104
  • 32
  • 201
  • 196
1

Dataframe has a built in method for this called resample just do...

df.resample('30Min',how='mean')

actually 'how' defaults to mean so you can shorten it to

df.resample('30Min')
ZJS
  • 3,991
  • 2
  • 15
  • 22