1

I have a dataset whose structure is : Date Profit

A sample of the dataset is:

   Date     Profit
2013-06-21   14
2013-06-22   19
2013-06-23   11
2013-06-24   13
2013-06-25   6
2013-06-26   22
2013-06-27   22
2013-06-28   3
2013-06-29   5
2013-06-30   10
2013-07-01   17
2013-07-02   14
2013-07-03   9
2013-07-04   7

Sample input is :

data = [('2013-06-21',14),
    ('2013-06-22',19),
    ('2013-06-23',11),
    ('2013-06-24',13),
    ('2013-06-25',6),
    ('2013-06-26',22),
    ('2013-06-27',22),
    ('2013-06-28',3),
    ('2013-06-29',5),
    ('2013-06-30',10),
    ('2013-07-01',17),
    ('2013-07-02',14),
    ('2013-07-03',9),
    ('2013-07-04',7)]

Now I want to do a rolling aggregation and store the aggregation. By rolling aggregation I mean that say for week 1 (2013-06-21 to 2013-06-27) I want to add the profit of previous date(s) and store it with current date. So for 2013-06-21 the sum will be 14 only as it is first day of week but then for 2013-06-22 it should be sum of previous date (2013-06-21) and current date (2013-06-22) and this should be stored along with current date. This will go on till end of week and then for next week it will again start over with no previous date for new week. So for first week the sample output should be something like this:

 Date     Profit
2013-06-21   14
2013-06-22   33  #(14 + 19)
2013-06-23   44  #(33 + 11)
2013-06-24   57  #(44 + 13) 
2013-06-25   63  #(57 + 6)
2013-06-26   85  #(63 + 22)
2013-06-27   107 #(85 + 22)

I tried to look at defaultdict and did this:

def aggregate(data, key, value, func):
    measures_dict = collections.defaultdict(list)
    for k,v in zip(data[key], data[value]):
        measures_dict[k].append(v)

return [(k,func(measures_dict[k])) for k in measures_dict.keys()] 

but I am not getting the result and think that defaultdict is not the right way to do it. I also looked at pandas but I am unable to get the way to start to do this. Can anyone help me in doing this rolling aggregation?

Jason Donnald
  • 2,256
  • 9
  • 36
  • 49
  • 1
    It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. – Martijn Pieters Feb 09 '15 at 20:42
  • Also, what defines a week; is Monday or is Sunday the start of a week? – Martijn Pieters Feb 09 '15 at 20:43
  • And what format is this data in? Is it sorted? Are the dates already parsed into objects or are they simply strings? – Martijn Pieters Feb 09 '15 at 20:43
  • @MartijnPieters I am new to python so I am unable to go ahead with any way on this. The start of the week is not defined by Monday or any other day. It is actually defined a start date. Also the dates are simple strings – Jason Donnald Feb 09 '15 at 20:47
  • @MartijnPieters I did tried to look into details of `defaultdict` but was unable to get the idea of how to use it in my case – Jason Donnald Feb 09 '15 at 20:49
  • So every 7 days *from the start date* is defined as a week? – Martijn Pieters Feb 09 '15 at 20:57
  • @MartijnPieters yes every 7days from start date is a week – Jason Donnald Feb 09 '15 at 21:00
  • The pandas tag is crucial here; I'm not familiar with the tool so I probably cannot help. – Martijn Pieters Feb 09 '15 at 21:04
  • @MartijnPieters I was looking into pandas so I added it to the tag that if someone knows a way to do this through pandas then it would be helpful but doing it through pandas is not a requirement. Any way of solutions is helpful for me – Jason Donnald Feb 09 '15 at 21:08
  • Thing is, Stack Overflow is not a coding service. We can help people solve problems they are facing *as they themselves program*. You appear to think that Stack Overflow is here to write code for you, but you have underspecified this and haven't shown any efforts to solve this yourself so we can help you fix that code. – Martijn Pieters Feb 09 '15 at 21:13
  • @MartijnPieters I appreciate your suggestion. I tried some code but I am not getting the result. Please see my updated post – Jason Donnald Feb 09 '15 at 21:27
  • Please do give us *proper input*, something we can call your function with. The function's expected output would be helpful too, as well as whatever you get now (errors, in full, or incorrect output). – Martijn Pieters Feb 09 '15 at 21:29
  • @MartijnPieters I have given sample input and output in my post above.Due to some confidentiality reasons I am not allowed to post real data – Jason Donnald Feb 09 '15 at 21:31
  • You have given us some text. I don't see any Python structures; we'd have to parse that text now and build something that your function could accept. You don't have to give us real data, just *enough* to help us help you. – Martijn Pieters Feb 09 '15 at 21:33
  • @MartijnPieters please see the updated post above – Jason Donnald Feb 09 '15 at 21:42

2 Answers2

6

See this answer: Cumulative sum and percentage on column?

and this: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors and this: http://pandas.pydata.org/pandas-docs/stable/groupby.html

Updated for Weekly Cumulative:

df = pd.DataFrame(data)
df.columns = ['Date','Profit']
df['Date'] = pd.to_datetime(df['Date'])
df['weekofyear'] = df['Date'].dt.weekofyear
df.reset_index('Date')
df.sort_index(inplace=True)
df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()

Output:

         Date  Profit  weekofyear  Weekly_Cum
0  2013-06-21      14          25          14
1  2013-06-22      19          25          33
2  2013-06-23      11          25          44
3  2013-06-24      13          26          13
4  2013-06-25       6          26          19
5  2013-06-26      22          26          41
6  2013-06-27      22          26          63
7  2013-06-28       3          26          66
8  2013-06-29       5          26          71
9  2013-06-30      10          26          81
10 2013-07-01      17          27          17
11 2013-07-02      14          27          31
12 2013-07-03       9          27          40
13 2013-07-04       7          27          47
Community
  • 1
  • 1
Liam Foley
  • 7,432
  • 2
  • 26
  • 24
  • I want to do rolling sum per week. so for the week of 2013-06-21 to 2013-06-27 there would be one rolling sum but then from 2013-06-28 it should start from fresh and not take the previous value of 2013-06-27 into account – Jason Donnald Feb 10 '15 at 02:10
  • @JasonDonnald: perhaps the [documentation can help here](http://pandas.pydata.org/pandas-docs/dev/computation.html#moving-rolling-statistics-moments)? – Martijn Pieters Feb 10 '15 at 14:39
  • @JasonDonnald Just updated for weekly Cumulatives. I did read through the original comments and do I think you could have done some more searching first, but you did post code to create the df which I appreciate. And I like solving problems I wouldn't run into in my own work for fake internet points. – Liam Foley Feb 10 '15 at 19:30
0

Just to a small fix in the @liam-foley answer:

df['year'] = df.index.year
df['weekofyear'] = df.index.weekofyear
df['Weekly_Cum'] = df.groupby(['year', 'weekofyear']).cumsum()

Otherwise the cumsum will calculate the same weekday from all years in the index.

John
  • 170
  • 1
  • 5