3

How do use a groupby to do a sliding window for a calculation in Pandas? Imagine I have a dataframe that looks like this:

df = pd.DataFrame({'type': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'data': [1,10,2,4,3,4,5,6]})


df
  type  data
0    A     1
1    A     10
2    A     2
3    A     4
4    B     3
5    B     4
6    B     5
7    B     6

and for every type in the dataframe I want to determine the standard deviation between the 1st and 3rd row (and only those, ignoring the data in row 2), on a sliding scale. This means for the A's I'd want to use these rows to find the first std dev:

  type  data
0    A     1    <----
1    A     10
2    A     2    <----
3    A     4

then on these:

  type  data
0    A     1    
1    A     10   <----
2    A     2
3    A     4   <----

and so on, repeated for the other types in this example. You can assume that there are a lot more than 4 types and more than 4 rows for each type. Is there a way to do something like this with groupby? I know this is doable with iloc, but I was hoping there was a more elegant and standard way with groupby or some other pandas function. I'm hoping there is something like this that will work . . .

df.groupby(df.type).sliding_window(slide=2).std()

EDIT: It seems rolling will not work. I want ONLY the endpoints used for std(), not the whole window. As an example, the first calculation should be std([1, 2]) because we'll be looking at index 0 and 2 exclusively and ignoring the value in index 1.

Bryant
  • 3,011
  • 1
  • 18
  • 26
  • Duplicate of: https://stackoverflow.com/questions/13996302/python-rolling-functions-for-groupby-object – Erfan Mar 25 '21 at 23:55
  • perhaps the solution is the same (potentially) but that result did not come up when searching for pandas groupby sliding window – Bryant Mar 26 '21 at 00:03
  • also, it now seems rolling does not work, so probably not a duplicate – Bryant Mar 26 '21 at 11:32

1 Answers1

1

rolling

df.groupby('type').rolling(3).std()

        data
type        
A    0   NaN
     1   NaN
     2   1.0
     3   1.0
B    4   NaN
     5   NaN
     6   1.0
     7   1.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • That output doesnt look correct. I'd expected to see a value at indexes 0 and 1, not at 2 and 3. – Bryant Mar 26 '21 at 00:01
  • it looks like adding a .shift(-2) to the end will work. If there is a better way, let me know – Bryant Mar 26 '21 at 00:06
  • actually @piRSquared, this wont work, my fault for picking bad example data. The rolling function uses the entire window for the std() calculation, i want to use ONLY indexes 0, and 2, it seems rolling does just what it says, a rolling calculation, using everything in the window. I only want to use the extremes/endpoints – Bryant Mar 26 '21 at 11:26
  • also, if this were only using the correct values I'd expect the std to be ~1.41. I updated the question so the numbers are more extreme. – Bryant Mar 26 '21 at 11:35
  • `df.groupby('type').rolling(2).std()` gives you the `~1.41` – piRSquared Mar 26 '21 at 13:43
  • when I run with the original dataframe, (with window 2), I get 0.707 for every row (except the first in the window, which is NaN). I don't think rolling will work, unless I re-organize the dataframe to have the rows I want next to eachother – Bryant Mar 26 '21 at 14:09