1

If I have a pandas database such as:

timestamp  label  value  new
   etc.      a     1      3.5
             b     2       5
             a     5      ...
             b     6      ...
             a     2      ...
             b     4      ...

I want the new column to be the average of the last two a's and the last two b's... so for the first it would be the average of 5 and 2 to get 3.5. It will be sorted by the timestamp. I know I could use a groupby to get the average of all the a's or all the b's but I'm not sure how to get an average of just the last two. I'm kinda new to python and coding so this might not be possible idk.

Edit: I should also mention this is not for a class or anything this is just for something I'm doing on my own and that this will be on a very large dataset. I'm just using this as an example. Also I would want each A and each B to have its own value for the last 2 average so the dimension of the new column will be the same as the others. So for the third line it would be the average of 2 and whatever the next a would be in the data set.

jrich
  • 45
  • 4

2 Answers2

0

Edited to reflect a change in the question specifying the last two, not the ones following the first, and that you wanted the same dimensionality with values repeated.

import pandas as pd

data = {'label': ['a','b','a','b','a','b'], 'value':[1,2,5,6,2,4]}

df = pd.DataFrame(data)
grouped = df.groupby('label')

results = {'label':[], 'tail_mean':[]}
for item, grp in grouped:
    subset_mean = grp.tail(2).mean()[0]
    results['label'].append(item)
    results['tail_mean'].append(subset_mean)

res_df = pd.DataFrame(results)
df = df.merge(res_df, on='label', how='left')

Outputs:

>> res_df
  label  tail_mean
0     a        3.5
1     b        5.0

>> df
  label  value  tail_mean
0     a      1        3.5
1     b      2        5.0
2     a      5        3.5
3     b      6        5.0
4     a      2        3.5
5     b      4        5.0

Now you have a dataframe of your results only, if you need them, plus a column with it merged back into the main dataframe. Someone else posted a more succinct way to get to the results dataframe; probably no reason to do it the longer way I showed here unless you also need to perform more operations like this that you could do inside the same loop.

Jeff
  • 2,158
  • 1
  • 16
  • 29
0

IIUC one way (among many) to do that:

In [139]: df.groupby('label').tail(2).groupby('label').mean().reset_index()
Out[139]:
  label  value
0     a    3.5
1     b    5.0
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Is there a way to make this iterate over the whole data frame. I want it to find the last two values average and then append that into a new column for each row. This is a very large data frame with a lot of different values, I just put that as an example. I've tried using nested for loops but I couldn't get it to work. – jrich Aug 18 '16 at 01:21
  • [How to make good reproducible pandas examples](http://stackoverflow.com/a/20159305/5741205) – MaxU - stand with Ukraine Aug 18 '16 at 08:12