0

I've got a dataframe and want to resample certain columns (as hourly sums and means from 10-minutely data) WITHIN the 3 different 'users' that exist in the dataset.

A normal resample would use code like:

import pandas as pd
import numpy as np    

df = pd.read_csv('example.csv')

df['Datetime'] = pd.to_datetime(df['date_datetime/_source'] + ' ' + df['time']) #create datetime stamp
df.set_index(df['Datetime'], inplace = True)

df = df.resample('1H', how={'energy_kwh': np.sum, 'average_w': np.mean, 'norm_average_kw/kw': np.mean, 'temperature_degc': np.mean, 'voltage_v': np.mean})
df

To geta a result like (please forgive the column formatting, I have no idea how to paste this properly to make it look nice):

                   energy_kwh norm_average_kw/kw    voltage_v   temperature_degc    average_w
Datetime                    
2013-04-30 06:00:00  0.027   0.007333    266.333333  4.366667    30.000000
2013-04-30 07:00:00  1.250   0.052333    298.666667  5.300000    192.500000
2013-04-30 08:00:00  5.287   0.121417    302.333333  7.516667    444.000000
2013-04-30 09:00:00  12.449  0.201000    297.500000  9.683333    726.000000
2013-04-30 10:00:00  26.101  0.396417    288.166667  11.150000   1450.000000
2013-04-30 11:00:00  45.396  0.460250    282.333333  12.183333   1672.500000
2013-04-30 12:00:00  64.731  0.440833    276.166667  13.550000   1541.000000
2013-04-30 13:00:00  87.095  0.562750    284.833333  13.733333   2084.500000

However, in the original CSV, there is a column containing URLs - in the dataset of 100,000 rows, there are 3 different URLs (effectively IDs). I want to have each resampled individually rather than having a 'lump' resample from all (e.g. 9.00 AM on 2014-01-01 would have data for all 3 users, but each should have it's own hourly sums and means).

I hope this makes sense - please let me know if I need to clarify anything.

FYI, I tried using the advice in the following 2 posts but to no avail:

Resampling a multi-index DataFrame

Resampling Within a Pandas MultiIndex

Thanks in advance

Community
  • 1
  • 1

1 Answers1

0

You can resample a groupby object, groupby-ed by URLs, in this minimal example:

In [157]:

df=pd.DataFrame({'Val': np.random.random(100)})
df['Datetime'] = pd.date_range('2001-01-01', periods=100, freq='5H') #create random dataset
df.set_index(df['Datetime'], inplace = True)
df.__delitem__('Datetime')
df['Location']=np.tile(['l0', 'l1', 'l2', 'l3', 'l4'], 20)
In [158]:

print df.groupby('Location').resample('10D', how={'Val':np.mean})
                                   Val
Location Datetime                     
l0       2001-01-01 00:00:00  0.334183
         2001-01-11 00:00:00  0.584260
l1       2001-01-01 05:00:00  0.288290
         2001-01-11 05:00:00  0.470140
l2       2001-01-01 10:00:00  0.381273
         2001-01-11 10:00:00  0.461684
l3       2001-01-01 15:00:00  0.703523
         2001-01-11 15:00:00  0.386858
l4       2001-01-01 20:00:00  0.448857
         2001-01-11 20:00:00  0.310914
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • this worked, thank you :) I didn't use the "df.__delitem__" command as I didn't see the need, but it still worked fine. – Bhajmeister Jul 07 '14 at 09:50