How do use a groupby to do a sliding window for a calculation in Pandas? Imagine I have a dataframe that looks like this:
df = pd.DataFrame({'type': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'data': [1,10,2,4,3,4,5,6]})
df
type data
0 A 1
1 A 10
2 A 2
3 A 4
4 B 3
5 B 4
6 B 5
7 B 6
and for every type
in the dataframe I want to determine the standard deviation between the 1st and 3rd row (and only those, ignoring the data in row 2), on a sliding scale. This means for the A's I'd want to use these rows to find the first std dev:
type data
0 A 1 <----
1 A 10
2 A 2 <----
3 A 4
then on these:
type data
0 A 1
1 A 10 <----
2 A 2
3 A 4 <----
and so on, repeated for the other types in this example. You can assume that there are a lot more than 4 types and more than 4 rows for each type. Is there a way to do something like this with groupby? I know this is doable with iloc
, but I was hoping there was a more elegant and standard way with groupby or some other pandas function. I'm hoping there is something like this that will work . . .
df.groupby(df.type).sliding_window(slide=2).std()
EDIT: It seems rolling will not work. I want ONLY the endpoints used for std()
, not the whole window. As an example, the first calculation should be std([1, 2])
because we'll be looking at index 0 and 2 exclusively and ignoring the value in index 1.