I have a DataFrame with two columns, A and B, all integers. In A they are repeated. I group by them, then sort by them, then I want to apply a rolling window over A to get the elements in column B grouped. How they are grouped is not crucial, anything will do, then I have to do a number of things over them, so I will most likely need to convert that collection into something else (a pandas.Series would probably be the best, concatenating groups in each window).
I think I have to start with something like this:
df.groupby('A').rolling(w)
but then how to get the elements?
I have tried to use apply, but I get only one window as a numpy array, not a window over groups.
Example:
In [1]: import pandas as pd
...: import numpy as np
...: import random
...:
...: random.seed(2)
...:
...: indexes = [i for i in range(1,100) for _ in range(10)]
...: dfi = pd.DataFrame({'A': indexes, 'B': [random.randint(1,99) for e in indexes]})
...: print(dfi.head()) # this is an input example
A B
0 1 8
1 1 12
2 1 11
3 1 47
4 1 22
In [2]: result = []
...: w = 3
...: for i in range(1,100):
...: result.append({'A': i, 'B': np.array([e for j, e in dfi.values if abs(i-j) < w or abs(min(i,j)+99-max(i,j)
...: ) < w])})
...: dfo = pd.DataFrame(result) # this is the expected output, to be obtained with groupby and rolling operations
...: print(dfo.head())
A B
0 1 [8, 12, 11, 47, 22, 95, 86, 40, 33, 78, 28, 78...
1 2 [8, 12, 11, 47, 22, 95, 86, 40, 33, 78, 28, 78...
2 3 [8, 12, 11, 47, 22, 95, 86, 40, 33, 78, 28, 78...
3 4 [28, 78, 5, 75, 88, 21, 56, 82, 51, 93, 66, 48...
4 5 [66, 48, 70, 57, 65, 35, 5, 4, 47, 60, 41, 49,...
Note that there are 50 elements in each array, 10 for each group, and then a window size of 5 (3 radius, non-inclusive)