Pandas: How to groupby consecutive column values

Question

i've a DataFrame like this:

           Frame    0         1       ...    start_frame  end_frame    phn
0             0   7.648325  0.098433  ...          0.0       25.0       h#
1             1   8.006168  0.045991  ...         10.0       35.0       h#
2             2   8.260857  0.331792  ...         20.0       45.0       h#
3             3   8.211206  0.126892  ...         30.0       55.0       h#
4             4   7.999766  0.219560  ...         40.0       65.0       h#
5             5   7.602877  0.095582  ...         50.0       75.0       h#
6             6   7.747911  0.118326  ...         60.0       85.0       h#
7             7   7.958229 -0.049620  ...         70.0       95.0       h#
...
25           25  15.159771  2.047468  ...        250.0      275.0       sh
26           26  15.580827  1.910970  ...        260.0      285.0       ix
27           27  15.899938  1.510074  ...        270.0      295.0       ix
28           28  16.191772  1.646987  ...        280.0      305.0       ix
29           29  16.055186  1.585445  ...        290.0      315.0       ix
..          ...        ...       ...  ...          ...        ...      ...
336         336  15.277283  1.688955  ...       3360.0     3385.0        y
337         337  15.446976  1.615444  ...       3370.0     3395.0       ih
338         338  15.628509  1.944911  ...       3380.0     3405.0       ih
339         339  15.737163  1.736013  ...       3390.0     3415.0       ih
...
361         361   8.719288 -1.060700  ...       3610.0     3635.0       h#
362         362   8.500200 -0.810346  ...       3620.0     3645.0       h#
363         363   8.186726 -0.479683  ...       3630.0     3655.0       h#
364         364   8.151884 -0.277089  ...       3640.0     3665.0       h#
365         365   7.944815 -0.460370  ...       3650.0     3675.0       h#

I would like to obtain a structure for each consecutive value of column 'phn'. For instance:

1) First matrix with row in range(0, 7) for first occurence of h#

2) Second matrix with row in range(value, 25) for 'sh'

and so on, until last matrix with rows in range(361, 365) for last occurence of 'h#'.

A generic data structure that contains the requested row, like dataframe, numpy array or also a simple matrix. — IlVileEzioGreggio, Mar 03 '19 at 18:15
If i try this `list(file.groupby(['phoneme']))` the first result is a matrix that have inside rows associate with all occurences of `h#`. I would like to separate _non-consecutive_ occurences. — IlVileEzioGreggio, Mar 03 '19 at 18:23

jezrael · Accepted Answer · 2019-03-03T18:21:57.257

1

First groups by consecutive values and then create list or dict:

g = df['phn'].ne(df['phn'].shift()).cumsum()
#for list
L = [v for k, v in df.groupby(g)]
print (L)

#for dictionary
d = dict(tuple(g))
#alternative
d = {k: v for k, v in df.groupby(g)}
print (d)

edited Mar 03 '19 at 18:21

answered Mar 03 '19 at 18:19

jezrael

822,522
95
1,334
1,252

1

It's working! Thanks a lot! <3 – IlVileEzioGreggio Mar 03 '19 at 18:26

Pandas: How to groupby consecutive column values

1 Answers1