1

A pandas/numpy noob, I have a little problem I am going around in circles solving...

A simple data structure from a csv, I have been able to sort it by PADD_NAME and image_date:

PADD_NAME __fid__ count image_date majority max mean median min minority range std sum unique
1662 Big Pan 3 3037 19800201 14.4 14.400000 14.400000 13.600000 13.600000 13.6 0 0.000001 41303.199219 1
2229 Big Pan 3 3037 19800301 11.9 11.900000 11.900002 14.400000 14.400000 14.4 0 0.000001 43732.800781 1
4539 Big Pan 3 3037 19800401 7.2 7.200000 7.200000 11.900000 11.900000 11.9 0 0.000002 36140.304688 1
2607 Big Pan 3 3037 19800501 18.3 18.299999 18.300001 7.200000 7.200000 7.2 0 0.000000 21866.400391 1
5799 Big Pan 3 3037 19800101 13.6 13.600000 13.600000 18.299999 18.299999 18.3 0 0.000002 55577.101562 1"

I would simply like to add a column and populate with it with the cum. sum of the last three values of the mean column:

PADD_NAME __fid__ count image_date majority max mean median min minority range std sum unique sum_mean_last3 
1662 Big Pan 3 3037 19800201 14.4 14.400000 14.400000 13.600000 13.600000 13.6 0 0.000001 41303.199219 1
2229 Big Pan 3 3037 19800301 11.9 11.900000 11.900002 14.400000 14.400000 14.4 0 0.000001 43732.800781 1 
4539 Big Pan 3 3037 19800401 7.2 7.200000 7.200000 11.900000 11.900000 11.9 0 0.000002 36140.304688 1 33.5
2607 Big Pan 3 3037 19800501 18.3 18.299999 18.300001 7.200000 7.200000 7.2 0 0.000000 21866.400391 1 37.4
5799 Big Pan 3 3037 19800101 13.6 13.600000 13.600000 18.299999 18.299999 18.3 0 0.000002 55577.101562 1 39.1

The mean values are a measure of monthly ground-cover (at time image_date), and I am looking to generate "seasonal" values (summer,autumn,...) - I realise the seasonal sums should start at the correct month, however getting this first step in would be a great help..


I have found and tried a few 'recipes' for sort-of-similar problems, but have got nowhere except confused:

Thanks in advance for any advice!

Community
  • 1
  • 1
djb
  • 13
  • 3
  • You'll probably want to check out the [``rolling_sum``](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html) function. – jakevdp Oct 26 '15 at 03:45

1 Answers1

1

You're looking for the pandas.rolling_sum() function:

>>> df = pd.DataFrame({'mean': 30 * np.random.random(5)})
>>> df['mean_sum'] = pd.rolling_sum(df['mean'], 3)
>>> df
        mean   mean_sum
0  22.987677        NaN
1   3.478543        NaN
2  11.923960  38.390181
3   1.545712  16.948215
4   1.452240  14.921912
jakevdp
  • 77,104
  • 11
  • 125
  • 160