Mean of 30 most recent data points for each unique value of another column

Question

I am trying to get the mean of 30 most recent points in column a for each type of product specified in column b given a date column c.

So the calculation of the average will be based on the most recent 30 points of each particular Product as opposed to the overall most recent data points of the whole DataFrame.

df:

Product            Value      Date
POL Mumbai         22.5       2015-6-26
STOLCO Finesse     55.5       2015-7-1
MPLR  Pure         85.0       2015-8-1

score 0 · Accepted Answer · edited May 23 '17 at 10:28

0

In general terms, you could groupby your DataFrame assumed to be called df by its column 'b' like so:

products = df.groupby('b)

then iterate through each product group as follows:

mean = {}
for product, data in products:
    mean[product] = data.sort('c', ascending=False).head(30)['a'].mean()
print DataFrame.from_dict(mean.items(), columns=['Product', 'Mean')

or

print Series(mean)

See here for details on the error you encountered.

edited May 23 '17 at 10:28

Community

1
1

answered Aug 12 '15 at 23:57

Stefan

41,759
13
76
81

Thanks Stefan, I tried this but I must be doing something wrong. Error is "If using all scalar values, you must pass an index". – pedramoh Aug 13 '15 at 16:35
Hi, could you pls share what your data, esp your 'products' look like? – Stefan Aug 13 '15 at 16:37
Just added example format in the original question. Cheers – pedramoh Aug 13 '15 at 16:50
Thanks - updated my answer, let's see if this works. – Stefan Aug 13 '15 at 17:25

Mean of 30 most recent data points for each unique value of another column

1 Answers1