pandas dataframe, group by, rolling window, then statistics

Question

I have a DataFrame with two columns, A and B, all integers. In A they are repeated. I group by them, then sort by them, then I want to apply a rolling window over A to get the elements in column B grouped. How they are grouped is not crucial, anything will do, then I have to do a number of things over them, so I will most likely need to convert that collection into something else (a pandas.Series would probably be the best, concatenating groups in each window).

I think I have to start with something like this:

df.groupby('A').rolling(w)

but then how to get the elements?

I have tried to use apply, but I get only one window as a numpy array, not a window over groups.

Example:

In [1]: import pandas as pd
   ...: import numpy as np
   ...: import random
   ...: 
   ...: random.seed(2)
   ...: 
   ...: indexes = [i for i in range(1,100) for _ in range(10)]
   ...: dfi = pd.DataFrame({'A': indexes, 'B': [random.randint(1,99) for e in indexes]})
   ...: print(dfi.head()) # this is an input example
   A   B
0  1   8
1  1  12
2  1  11
3  1  47
4  1  22
In [2]: result = []
   ...: w = 3
   ...: for i in range(1,100):
   ...:     result.append({'A': i, 'B': np.array([e for j, e in dfi.values if abs(i-j) < w or abs(min(i,j)+99-max(i,j)
   ...: ) < w])})
   ...: dfo = pd.DataFrame(result) # this is the expected output, to be obtained with groupby and rolling operations
   ...: print(dfo.head())
   A                                                  B
0  1  [8, 12, 11, 47, 22, 95, 86, 40, 33, 78, 28, 78...
1  2  [8, 12, 11, 47, 22, 95, 86, 40, 33, 78, 28, 78...
2  3  [8, 12, 11, 47, 22, 95, 86, 40, 33, 78, 28, 78...
3  4  [28, 78, 5, 75, 88, 21, 56, 82, 51, 93, 66, 48...
4  5  [66, 48, 70, 57, 65, 35, 5, 4, 47, 60, 41, 49,...

Note that there are 50 elements in each array, 10 for each group, and then a window size of 5 (3 radius, non-inclusive)

Please add an example dataframe and expected output, see more information here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Erfan, Mar 14 '19 at 17:21
this isn't clear at all, please make a simple reproducible dataframe so that we can practice, and please give expected output. — Matt W., Mar 14 '19 at 17:38
See example with input DataFrame and output DataFrame here: http://dpaste.com/1EZTW1A I will convert to a rendered notebook when possible — Trylks, Mar 14 '19 at 19:00
I think I cannot do what I wanted to do, after hitting my head against a wall repeatedly, I was getting a very prevalent error (`cannot handle this type -> object`), apparently rolling is mostly for numbers, not lists, according to this: https://github.com/pandas-dev/pandas/issues/23002 I guess my only option is forgetting about pandas and trying to optimize python code... — Trylks, Mar 14 '19 at 23:16

Nathaniel · Accepted Answer · 2019-03-14T21:21:37.290

2

Here is a solution using aggregate:

dfo = dfi.groupby('A').agg(lambda x: list(x))
dfo.head()
                                               B
    A                                           
    1    [8, 12, 11, 47, 22, 95, 86, 40, 33, 78]
    2    [28, 78, 5, 75, 88, 21, 56, 82, 51, 93]
    3     [66, 48, 70, 57, 65, 35, 5, 4, 47, 60]
    4    [41, 49, 55, 68, 22, 72, 23, 31, 30, 4]
    5   [23, 42, 23, 18, 66, 66, 47, 66, 87, 72]

And you can return a Series instead of a DataFrame by accessing column 'B' in the DataFrameGroupBy object like this:

dfo = dfi.groupby('A')['B'].apply(lambda x: list(x))

edited Mar 14 '19 at 21:21

answered Mar 14 '19 at 20:56

Nathaniel

3,230
11
18

1

Added how to get series from grouper to Nathaniel's answer. – Rich Andrews Mar 14 '19 at 21:16
1

certainly I was missing the application of the list function, BTW it can be done like this: `dfi.groupby('A').agg(list)`, but after doing that I need to do a rolling window and aggregate the lists again – Trylks Mar 14 '19 at 22:12

pandas dataframe, group by, rolling window, then statistics

1 Answers1