0

i've a DataFrame like this:

           Frame    0         1       ...    start_frame  end_frame    phn
0             0   7.648325  0.098433  ...          0.0       25.0       h#
1             1   8.006168  0.045991  ...         10.0       35.0       h#
2             2   8.260857  0.331792  ...         20.0       45.0       h#
3             3   8.211206  0.126892  ...         30.0       55.0       h#
4             4   7.999766  0.219560  ...         40.0       65.0       h#
5             5   7.602877  0.095582  ...         50.0       75.0       h#
6             6   7.747911  0.118326  ...         60.0       85.0       h#
7             7   7.958229 -0.049620  ...         70.0       95.0       h#
...
25           25  15.159771  2.047468  ...        250.0      275.0       sh
26           26  15.580827  1.910970  ...        260.0      285.0       ix
27           27  15.899938  1.510074  ...        270.0      295.0       ix
28           28  16.191772  1.646987  ...        280.0      305.0       ix
29           29  16.055186  1.585445  ...        290.0      315.0       ix
..          ...        ...       ...  ...          ...        ...      ...
336         336  15.277283  1.688955  ...       3360.0     3385.0        y
337         337  15.446976  1.615444  ...       3370.0     3395.0       ih
338         338  15.628509  1.944911  ...       3380.0     3405.0       ih
339         339  15.737163  1.736013  ...       3390.0     3415.0       ih
...
361         361   8.719288 -1.060700  ...       3610.0     3635.0       h#
362         362   8.500200 -0.810346  ...       3620.0     3645.0       h#
363         363   8.186726 -0.479683  ...       3630.0     3655.0       h#
364         364   8.151884 -0.277089  ...       3640.0     3665.0       h#
365         365   7.944815 -0.460370  ...       3650.0     3675.0       h#

I would like to obtain a structure for each consecutive value of column 'phn'. For instance:

1) First matrix with row in range(0, 7) for first occurence of h#

2) Second matrix with row in range(value, 25) for 'sh'

and so on, until last matrix with rows in range(361, 365) for last occurence of 'h#'.

  • Could you better explain what you mean with `a structure`? – yatu Mar 03 '19 at 18:13
  • A generic data structure that contains the requested row, like dataframe, numpy array or also a simple matrix. – IlVileEzioGreggio Mar 03 '19 at 18:15
  • If i try this `list(file.groupby(['phoneme']))` the first result is a matrix that have inside rows associate with all occurences of `h#`. I would like to separate _non-consecutive_ occurences. – IlVileEzioGreggio Mar 03 '19 at 18:23

1 Answers1

1

First groups by consecutive values and then create list or dict:

g = df['phn'].ne(df['phn'].shift()).cumsum()
#for list
L = [v for k, v in df.groupby(g)]
print (L)

#for dictionary
d = dict(tuple(g))
#alternative
d = {k: v for k, v in df.groupby(g)}
print (d)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252