I'm trying to generate a cumulative column between any two given datetime ranges using a pandas pivot table, however it's unclear how to actually achieve this. I can create one for all dates as per below.
All datetimes:
sum count cum_sum
dt
2015-01-01 10:00:00 10 10
2015-01-01 12:00:00 20 30
2015-01-01 13:00:00 30 60
2015-01-02 10:00:00 10 70
2015-01-02 12:00:00 20 90
2015-01-02 13:00:00 30 120
Between two specified datetimes:
sum count cum_sum
dt
2015-01-01 12:00:00 20 30
2015-01-01 13:00:00 30 60
2015-01-02 10:00:00 10 70
2015-01-02 12:00:00 20 90
Is there a way to generate the table above, but start the accumulation from the start date in the query (or do this in the dataframe itself?).
my code:
import pandas as pd
import numpy as np
from datetime import datetime
data=[
{'count': 10, 'dt': datetime.strptime("20150101 10:00", "%Y%m%d %H:%M") },
{'count': 20, 'dt': datetime.strptime("20150101 12:00", "%Y%m%d %H:%M") },
{'count': 30, 'dt': datetime.strptime("20150101 13:00", "%Y%m%d %H:%M") },
{'count': 10, 'dt': datetime.strptime("20150102 10:00", "%Y%m%d %H:%M") },
{'count': 20, 'dt': datetime.strptime("20150102 12:00", "%Y%m%d %H:%M") },
{'count': 30, 'dt': datetime.strptime("20150102 13:00", "%Y%m%d %H:%M") }
]
df = pd.DataFrame(data)
df['cum_sum']=df['count'].cumsum()
pivot=pd.pivot_table(df, index=['dt'],aggfunc=[np.sum])
print (pivot)
result = pivot.query('dt >= "{0}" and dt <=" {1}"'.format(
datetime.strptime("20150101 11:00", "%Y%m%d %H:%M"),
datetime.strptime("20150102 12:00", "%Y%m%d %H:%M")
))
print (result)
Edit: I want to create a cumulative column between 2 date ranges but have sub-criteria.
data=[
{'loc': 'Japan', 'count': 10, 'dt': datetime.strptime("20150101 10:00", "%Y%m%d %H:%M") },
{'loc': 'Japan', 'count': 20, 'dt': datetime.strptime("20150101 12:00", "%Y%m%d %H:%M") },
{'loc': 'Japan', 'count': 30, 'dt': datetime.strptime("20150101 13:00", "%Y%m%d %H:%M") },
{'loc': 'London', 'count': 10, 'dt': datetime.strptime("20150102 10:00", "%Y%m%d %H:%M") },
{'loc': 'London', 'count': 20, 'dt': datetime.strptime("20150102 12:00", "%Y%m%d %H:%M") },
{'loc': 'NewYork', 'count': 30, 'dt': datetime.strptime("20150102 13:00", "%Y%m%d %H:%M") }
]
so the output would be for a particular datetime range:
Loc Count cum_sum
Japan
2015-01-01 10:00:00 10 10
2015-01-01 13:00:00 30 40
2015-01-02 13:00:00 30 70
London
2015-01-01 12:00:00 20 20
2015-01-02 10:00:00 10 20
2015-01-02 12:00:00 20 40