0

I have a dataframe with information about year, month, day, time and other variables (var1, var2, var3 etc). I want to do the following:

(1) Arrange in the ascending order of time

(2) Group together rows with same year, month, and day.

(3) In each group, make chunks of 3 (or N in general) consecutive events and if the last remaining events in that group is < 3, still group them together in a chunk.

E.g.:

   year            Month     Day  Time (hr)   var1       var2      var3
       2018            8         17   8.716667  0.152741 -1.647750 -1.605000 
       2018            8         17   8.716667  0.093366  2.000781  1.973152 
       2018            8         17   8.716667  0.239732 -1.663985 -1.698509
       2018            8         17   8.716667  0.184664  1.689448  1.649670     
       2018            8         17   8.716667  0.097565  1.619323  1.645629

I would like to have first 3 rows together chunked like this:

   2018            8         17   8.716667  0.152741 -1.647750 -1.605000 
   2018            8         17   8.716667  0.093366  2.000781  1.973152 
   2018            8         17   8.716667  0.239732 -1.663985 -1.698509

and the next remaining in a separate chunk

   2018            8         17   8.716667  0.184664  1.689448  1.649670     
   2018            8         17   8.716667  0.097565  1.619323  1.645629 

so that I can do further processing in these chunks.

I can do (1) and (2) like this:

 df = dfFull.sort_values(by='event_time', ascending=True)
 df = df.groupby([dfFull.event_year,dfFull.event_month,dfFull.event_day])

However, I am not sure how to achieve the (3) step.

Any suggestions would be appreciated.

Thanks

Debutante
  • 1
  • 1
  • You can just chunk the result by using : list_df = [df[i:i+n] for i in range(0,df.shape[0],n)] where n is the size of the chunk Found here : https://stackoverflow.com/questions/44729727/pandas-slice-large-dataframe-into-chunks – S.Gradit Mar 23 '22 at 09:42
  • or: https://stackoverflow.com/questions/52819416/dividing-a-pandas-groupby-object-into-chunks – mozway Mar 23 '22 at 09:45
  • Thanks a lot. list_df.append(np.array_split(df.get_group(key), n)) worked perfectly for my purpose – Debutante Mar 25 '22 at 08:03

1 Answers1

0

For (3) you could create a chunk index and the use it for a new grouping:

   df = dfFull.sort_values(by='event_time', ascending=True)
   df['chunkIndex'] = df.groupby([dfFull.event_year,dfFull.event_month,dfFull.event_day]).cumcount() // 3
   df.groupby(['event_year','event_month','event_day', 'chunkIndex'])
Learning is a mess
  • 7,479
  • 7
  • 35
  • 71