-1

I have stumbled upon this thread:

Monthly Averages Using Daily Data Using Python Pandas

which has answered most of my question. The remaining part is, how can I line up the results with the original data accordingly, like this:

1979    1   1   3
1979    1   2   3
1979    1   3   3
1979    1   4   3
1979    1   5   3
1979    1   6   3
1979    1   7   4
1979    1   8   5
1979    1   9   7
1979    1   10  8
1979    1   11  16
1979    1   12  16
1979    1   13  16
1979    1   14  18
1979    1   15  18
1979    1   16  18
1979    1   17  18
1979    1   18  20
1979    1   19  20
1979    1   20  20
1979    1   21  20
1979    1   22  20
1979    1   23  18
1979    1   24  18
1979    1   25  18
1979    1   26  18
1979    1   27  18
1979    1   28  18
1979    1   29  18
1979    1   30  18
1979    1   31  19  13.709677
1979    2   1   19
1979    2   2   19
1979    2   3   19
1979    2   4   19
1979    2   5   19
1979    2   6   22
1979    2   7   24
1979    2   8   27
1979    2   9   29
1979    2   10  32
1979    2   11  32
1979    2   12  32
1979    2   13  32
1979    2   14  33
1979    2   15  33
1979    2   16  33
1979    2   17  34
1979    2   18  36
1979    2   19  36
1979    2   20  36
1979    2   21  36
1979    2   22  36
1979    2   23  36
1979    2   24  31
1979    2   25  29
1979    2   26  27
1979    2   27  27
1979    2   28  27  29.107143

Thanks in advance

Jason Tam
  • 55
  • 8
  • You want to know how to erase every column value except the last one? – cs95 Aug 24 '17 at 10:01
  • @cᴏʟᴅsᴘᴇᴇᴅ I am after a way to line up the periodic results. Let's say if this was 5-min data rather than monthly data, I would loop through the entire set and only assign a value (average of the past hour) in the hourly average column whenever the minute column is zero. – Jason Tam Aug 24 '17 at 20:32
  • Already have thanks. This method essentially only leaves behind the last of the duplicated values, whereas I was more looking towards a more general approach that only assigns a value when a condition is met using values from multiple columns. Perhaps I should start another thread for that, cheers – Jason Tam Aug 24 '17 at 21:09

2 Answers2

0

You can use dfGroupBy.transform followed by df.duplicated to mark and replace duplicates.

In [198]: df['monthly_avg'] = df.groupby(["year", "month"]).snow_depth.transform('mean')

In [205]: df.set_value(df['monthly_avg'].duplicated('last'), 'monthly_avg', '')
Out[205]: 
    year  month  day  snow_depth monthly_avg
0   1979      1    1           3            
1   1979      1    2           3            
2   1979      1    3           3            
3   1979      1    4           3            
4   1979      1    5           3            
5   1979      1    6           3            
6   1979      1    7           4            
7   1979      1    8           5            
8   1979      1    9           7            
9   1979      1   10           8            
10  1979      1   11          16            
11  1979      1   12          16            
12  1979      1   13          16            
13  1979      1   14          18            
14  1979      1   15          18            
15  1979      1   16          18            
16  1979      1   17          18            
17  1979      1   18          20            
18  1979      1   19          20            
19  1979      1   20          20            
20  1979      1   21          20            
21  1979      1   22          20            
22  1979      1   23          18            
23  1979      1   24          18            
24  1979      1   25          18            
25  1979      1   26          18            
26  1979      1   27          18            
27  1979      1   28          18            
28  1979      1   29          18            
29  1979      1   30          18            
30  1979      1   31          19     13.7097
31  1979      2    1          19            
32  1979      2    2          19            
33  1979      2    3          19            
34  1979      2    4          19            
35  1979      2    5          19            
36  1979      2    6          22            
37  1979      2    7          24            
38  1979      2    8          27            
39  1979      2    9          29            
40  1979      2   10          32            
41  1979      2   11          32            
42  1979      2   12          32            
43  1979      2   13          32            
44  1979      2   14          33            
45  1979      2   15          33            
46  1979      2   16          33            
47  1979      2   17          34            
48  1979      2   18          36            
49  1979      2   19          36            
50  1979      2   20          36            
51  1979      2   21          36            
52  1979      2   22          36            
53  1979      2   23          36            
54  1979      2   24          31            
55  1979      2   25          29            
56  1979      2   26          27            
57  1979      2   27          27            
58  1979      2   28          27     29.1071
cs95
  • 379,657
  • 97
  • 704
  • 746
-1

I took the answer of the original thread and adjusted it.

# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Group data first by year, then by month
df['monthly_avg'] = df.groupby(["year", "month"])['snow_depth'].transform('mean')

Transform will broadcast the result of your aggregeated data based on the groups of the groupby. So every row of your df will have the monthly average.

P.Tillmann
  • 2,090
  • 10
  • 17
  • That's exactly my answer, already, and besides, it needs more work to get the format OP wanted. – cs95 Aug 24 '17 at 10:28