3

I need to show the amount of calls that are made within every hour in an entire month. So far I could resample the CSV in the following way:

                       Amount
   Date
2017-03-01 00:00:00      5
2017-03-01 01:00:00      1
     .
     .
2017-03-31 22:00:00      7
2017-03-31 23:00:00      2

The date is a datetimeIndex y resampled all values in intervals of one hour. What I need is to be able to group by hour all rows, what i mean is to group all the calls that are made in every day of the month at 21:00,for example, sum the amount and show it in a single row. For example:

                     Amount
      Date
 2017-03  00:00:00     600
 2017-03  01:00:00     200
 2017-03  02:00:00      30
        .
        .
 2017-03  22:00:00     500
 2017-03  23:00:00     150
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • You want to `groupby()` by `Date.dt.hour`. (Possibly make `Date` a column first by calling `reset_index()`). – DYZ May 05 '17 at 01:39

2 Answers2

2

Setup

import pandas as pd
import datetime as dt
idx = pd.date_range('2017-03-01 00:00:00', '2017-03-31 23:00:00', freq='H')
df = pd.DataFrame(index=idx, columns=['Amount'], data=np.random.randint(1,100,len(idx)))

Solution

#Convert index to a list of dates without days like (2017-03  00:00:00 )
group_date = [dt.datetime.strftime(e, '%Y-%m %H:%M:%S') for e in df.index]

#group the data by the new group_date
df.groupby(group_date)['Amount'].sum().to_frame()

Out[592]: 
                  Amount
2017-03 00:00:00    1310
2017-03 01:00:00    1339
2017-03 02:00:00    1438
2017-03 03:00:00    1660
2017-03 04:00:00    1503
2017-03 05:00:00    1466
2017-03 06:00:00    1380
2017-03 07:00:00    1720
2017-03 08:00:00    1399
2017-03 09:00:00    1633
2017-03 10:00:00    1632
2017-03 11:00:00    1706
2017-03 12:00:00    1526
2017-03 13:00:00    1433
2017-03 14:00:00    1577
2017-03 15:00:00    1061
2017-03 16:00:00    1769
2017-03 17:00:00    1449
2017-03 18:00:00    1621
2017-03 19:00:00    1602
2017-03 20:00:00    1541
2017-03 21:00:00    1409
2017-03 22:00:00    1711
2017-03 23:00:00    1313
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
2

You can use DatetimeIndex.strftime with groupby and aggregating sum:

df1 = df.groupby(df.index.strftime('%Y-%m %H:%M:%S'))[['Amount']].sum()

#with borrowing sample from Allen
print (df1)
                  Amount
2017-03 00:00:00    1528
2017-03 01:00:00    1505
2017-03 02:00:00    1606
2017-03 03:00:00    1750
2017-03 04:00:00    1493
2017-03 05:00:00    1649
2017-03 06:00:00    1390
2017-03 07:00:00    1147
2017-03 08:00:00    1687
2017-03 09:00:00    1602
2017-03 10:00:00    1755
2017-03 11:00:00    1381
2017-03 12:00:00    1390
2017-03 13:00:00    1565
2017-03 14:00:00    1805
2017-03 15:00:00    1674
2017-03 16:00:00    1375
2017-03 17:00:00    1601
2017-03 18:00:00    1493
2017-03 19:00:00    1422
2017-03 20:00:00    1781
2017-03 21:00:00    1709
2017-03 22:00:00    1578
2017-03 23:00:00    1583

Another solution with DatetimeIndex.to_period and DatetimeIndex.hour. Then use groupby and sum and last create Index from MultiIndex with map:

a = df.index.to_period('M')
b = df.index.hour
df1 = df.groupby([a,b])[['Amount']].sum()
#http://stackoverflow.com/questions/17118071/python-add-leading-zeroes-using-str-format 
df1.index = df1.index.map(lambda x: '{0[0]} {0[1]:0>2}:00:00'.format(x))
print (df1)
                  Amount
2017-03 00:00:00    1739
2017-03 01:00:00    1502
2017-03 02:00:00    1585
2017-03 03:00:00    1710
2017-03 04:00:00    1679
2017-03 05:00:00    1371
2017-03 06:00:00    1489
2017-03 07:00:00    1252
2017-03 08:00:00    1540
2017-03 09:00:00    1443
2017-03 10:00:00    1589
2017-03 11:00:00    1499
2017-03 12:00:00    1837
2017-03 13:00:00    1834
2017-03 14:00:00    1695
2017-03 15:00:00    1616
2017-03 16:00:00    1499
2017-03 17:00:00    1329
2017-03 18:00:00    1727
2017-03 19:00:00    1764
2017-03 20:00:00    1754
2017-03 21:00:00    1621
2017-03 22:00:00    1486
2017-03 23:00:00    1672

Timings:

In [394]: %timeit (jez(df))
1 loop, best of 3: 630 ms per loop

In [395]: %timeit (df.groupby(df.index.strftime('%Y-%m %H:%M:%S'))[['Amount']].sum())
1 loop, best of 3: 792 ms per loop

#Allen's solution
In [396]: %timeit (df.groupby([dt.datetime.strftime(e, '%Y-%m %H:%M:%S') for e in df.index])['Amount'].sum().to_frame())
1 loop, best of 3: 663 ms per loop

Code for timings:

np.random.seed(100)
#[68712 rows x 1 columns]
idx = pd.date_range('2010-03-01 00:00:00', '2017-12-31 23:00:00', freq='H')
df = pd.DataFrame(index=idx, columns=['Amount'], data=np.random.randint(1,100,len(idx)))
print (df.head())

def jez(df):
    a = df.index.to_period('M')
    b = df.index.hour
    df1 = df.groupby([a,b])[['Amount']].sum()
    df1.index = df1.index.map(lambda x: '{0[0]} {0[1]:0>2}:00:00'.format(x))
    return (df1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252