Group consecutive rows in a pandas dataframe by conditioning on hitting max value in another column

Question

I have a pandas dataframe indexed by a time series with columns of GPS latitude and acceleration for a satellite orbiting the Earth. This latitude oscillates between maximum and minimum values with a constant time period as expected. What I want to do is integrate the acceleration column over each orbital period.

I understand I need to use the pandas 'groupby' method to group each period. Howeve, I can't figure out how I can group the consecutive rows into orbital periods (say iterating through it and grouping until I hit the maximum latitude value thus defining the end of an orbit?). After grouping I can then apply a numerical integration on each period.

An example code generating a similar DataFrame is given below.

from scipy import signal

t = np.linspace(0, 1, 500)
np.random.seed(0) # make sure we will get the same values every time
df = pd.DataFrame(
 {'Lat': signal.sawtooth(2 *np.pi * 5 * t, 0.5),
  'Acc': np.random.rand(500)}, 
 index=pd.date_range('1/1/2011 00:00:00.006392', periods=500, freq='10ms')
)

Any help would be much appreciated. And any more information please ask!

Welcome to StackOverflow! Please read about [how to ask a question](https://stackoverflow.com/help/how-to-ask) (particularly [how to create a good example](https://stackoverflow.com/help/mcve)) in order to get good responses. — Alex, Feb 07 '18 at 22:03
Can you post a small sample reproducible data set and your desired data set? Also please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. — MaxU - stand with Ukraine, Feb 07 '18 at 22:04
@MaxU edited with some code generating a DataFrame with sample but similar data. — Andreas Ioannou, Feb 07 '18 at 22:46
@AndreasIoannou, I don't know what does it mean: `"to integrate the acceleration column over each orbital period"` - that's why I asked also for a desired data set. PS I have slightly modified your code in order to produce the same random values (`np.random.seed(...)`) — MaxU - stand with Ukraine, Feb 07 '18 at 23:00
@MaxU I have the acceleration of the satellite as a function of time. I want to calculate the change in velocity over each orbital period (i.e integrate the acceleration with respect to time to calculate the velocity). But first I need to split/group the rows of the DataFrame into single orbitals. Does this make more sense? — Andreas Ioannou, Feb 07 '18 at 23:16

score 0 · Answer 1 · answered Feb 07 '18 at 23:34

IIUC here is a small demo:

In [280]: d = pd.DataFrame({'Lat':[1,3,4,2,0,1,2,3], 'Acc':np.random.randint(0,4,8)})

In [281]: d
Out[281]:
   Acc  Lat
0    3    1
1    1    3
2    2    4
3    3    2
4    2    0
5    2    1
6    2    2
7    2    3

In [282]: d.groupby(np.sign(d.Lat.diff().bfill(0)).diff().fillna(0).ne(0).cumsum())['Acc'].sum()
Out[282]:
Lat
0    6
1    5
2    6
Name: Acc, dtype: int32

Details:

In [288]: d.Lat.diff().bfill(0)
Out[288]:
0    2.0
1    2.0
2    1.0
3   -2.0
4   -2.0
5    1.0
6    1.0
7    1.0
Name: Lat, dtype: float64

In [289]: np.sign(d.Lat.diff().bfill(0))
Out[289]:
0    1.0
1    1.0
2    1.0
3   -1.0
4   -1.0
5    1.0
6    1.0
7    1.0
Name: Lat, dtype: float64

In [290]: np.sign(d.Lat.diff().bfill(0)).diff().fillna(0).ne(0)
Out[290]:
0    False
1    False
2    False
3     True
4    False
5     True
6    False
7    False
Name: Lat, dtype: bool

In [291]: np.sign(d.Lat.diff().bfill(0)).diff().fillna(0).ne(0).cumsum()
Out[291]:
0    0
1    0
2    0
3    1
4    1
5    2
6    2
7    2
Name: Lat, dtype: int32

Group consecutive rows in a pandas dataframe by conditioning on hitting max value in another column

1 Answers1