Calculating averages of time series data using pandas dataframe

Question

I have stumbled upon this thread:

Monthly Averages Using Daily Data Using Python Pandas

which has answered most of my question. The remaining part is, how can I line up the results with the original data accordingly, like this:

1979    1   1   3
1979    1   2   3
1979    1   3   3
1979    1   4   3
1979    1   5   3
1979    1   6   3
1979    1   7   4
1979    1   8   5
1979    1   9   7
1979    1   10  8
1979    1   11  16
1979    1   12  16
1979    1   13  16
1979    1   14  18
1979    1   15  18
1979    1   16  18
1979    1   17  18
1979    1   18  20
1979    1   19  20
1979    1   20  20
1979    1   21  20
1979    1   22  20
1979    1   23  18
1979    1   24  18
1979    1   25  18
1979    1   26  18
1979    1   27  18
1979    1   28  18
1979    1   29  18
1979    1   30  18
1979    1   31  19  13.709677
1979    2   1   19
1979    2   2   19
1979    2   3   19
1979    2   4   19
1979    2   5   19
1979    2   6   22
1979    2   7   24
1979    2   8   27
1979    2   9   29
1979    2   10  32
1979    2   11  32
1979    2   12  32
1979    2   13  32
1979    2   14  33
1979    2   15  33
1979    2   16  33
1979    2   17  34
1979    2   18  36
1979    2   19  36
1979    2   20  36
1979    2   21  36
1979    2   22  36
1979    2   23  36
1979    2   24  31
1979    2   25  29
1979    2   26  27
1979    2   27  27
1979    2   28  27  29.107143

Thanks in advance

You want to know how to erase every column value except the last one? — cs95, Aug 24 '17 at 10:01
@cᴏʟᴅsᴘᴇᴇᴅ I am after a way to line up the periodic results. Let's say if this was 5-min data rather than monthly data, I would loop through the entire set and only assign a value (average of the past hour) in the hourly average column whenever the minute column is zero. — Jason Tam, Aug 24 '17 at 20:32
Already have thanks. This method essentially only leaves behind the last of the duplicated values, whereas I was more looking towards a more general approach that only assigns a value when a condition is met using values from multiple columns. Perhaps I should start another thread for that, cheers — Jason Tam, Aug 24 '17 at 21:09

cs95 · Accepted Answer · 2017-08-24T10:13:52.283

You can use dfGroupBy.transform followed by df.duplicated to mark and replace duplicates.

In [198]: df['monthly_avg'] = df.groupby(["year", "month"]).snow_depth.transform('mean')

In [205]: df.set_value(df['monthly_avg'].duplicated('last'), 'monthly_avg', '')
Out[205]: 
    year  month  day  snow_depth monthly_avg
0   1979      1    1           3            
1   1979      1    2           3            
2   1979      1    3           3            
3   1979      1    4           3            
4   1979      1    5           3            
5   1979      1    6           3            
6   1979      1    7           4            
7   1979      1    8           5            
8   1979      1    9           7            
9   1979      1   10           8            
10  1979      1   11          16            
11  1979      1   12          16            
12  1979      1   13          16            
13  1979      1   14          18            
14  1979      1   15          18            
15  1979      1   16          18            
16  1979      1   17          18            
17  1979      1   18          20            
18  1979      1   19          20            
19  1979      1   20          20            
20  1979      1   21          20            
21  1979      1   22          20            
22  1979      1   23          18            
23  1979      1   24          18            
24  1979      1   25          18            
25  1979      1   26          18            
26  1979      1   27          18            
27  1979      1   28          18            
28  1979      1   29          18            
29  1979      1   30          18            
30  1979      1   31          19     13.7097
31  1979      2    1          19            
32  1979      2    2          19            
33  1979      2    3          19            
34  1979      2    4          19            
35  1979      2    5          19            
36  1979      2    6          22            
37  1979      2    7          24            
38  1979      2    8          27            
39  1979      2    9          29            
40  1979      2   10          32            
41  1979      2   11          32            
42  1979      2   12          32            
43  1979      2   13          32            
44  1979      2   14          33            
45  1979      2   15          33            
46  1979      2   16          33            
47  1979      2   17          34            
48  1979      2   18          36            
49  1979      2   19          36            
50  1979      2   20          36            
51  1979      2   21          36            
52  1979      2   22          36            
53  1979      2   23          36            
54  1979      2   24          31            
55  1979      2   25          29            
56  1979      2   26          27            
57  1979      2   27          27            
58  1979      2   28          27     29.1071

score -1 · Answer 2 · answered Aug 24 '17 at 10:10

I took the answer of the original thread and adjusted it.

# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Group data first by year, then by month
df['monthly_avg'] = df.groupby(["year", "month"])['snow_depth'].transform('mean')

Transform will broadcast the result of your aggregeated data based on the groups of the groupby. So every row of your df will have the monthly average.

That's exactly my answer, already, and besides, it needs more work to get the format OP wanted. — cs95, Aug 24 '17 at 10:28

Calculating averages of time series data using pandas dataframe

2 Answers2