1

I have miserably failed extrapolating from any answers I have found for grouping a dataframe, and then merging back the group semantics computed by groupby into the original dataframe. Seems documentation is lacking and SO answers are not applicable to current pandas versions.

This code:

grouped = df.groupby(pd.Grouper(
            key = my_time_column, 
            freq = '15Min', 
            label='left', 
            sort=True)).apply(pd.DataFrame)

Yields back a dataframe, but I have found no way of making the transition to a dataframe having the same data as the original df, while also populating a new column with the start datetime, of the group that each row belonged to in the groupby object.

Here's my current hack that solves it:

grouped = df.groupby(pd.Grouper(
            key = my_datetime_column, 
            freq = '15Min', 
            label='left', 
            sort=True))

sorted_df = grouped.apply(pd.DataFrame)

interval_starts = []
for group_idx, group_member_indices in grouped.indices.items():
    for group_member_index in group_member_indices:
        interval_starts.append(group_idx)

sorted_df['interval_group_start'] = interval_starts

Wondering if there's an elegant pandas way.

pandas version: 0.23.0

matanster
  • 15,072
  • 19
  • 88
  • 167
  • You want to use `groupby` + `transform` to return a like-indexed Series. In this case, probably sort by the time column and transform `.first` to broadcast the result back to every group member. If you provide a [mcve] with sample data, I'm sure someone will provide the exact code. – ALollz Oct 08 '18 at 17:37
  • You are looking for `transform()`: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html – rahlf23 Oct 08 '18 at 17:43

1 Answers1

1

IIUC, this should do what you're looking for:

grouped = df.groupby(pd.Grouper(key=my_time_column, 
                                freq = '15Min', 
                                label='left', 
                                sort=True))\
            .apply(pd.DataFrame)
grouped['start'] = grouped.loc[:, my_time_column] \
                          .groupby(level=0) \
                          .transform('min')
PMende
  • 5,171
  • 2
  • 19
  • 26
  • Sorry for misleading you maybe. I've added code that solves it to the question, basically my code included in the question is fast enough. unsure how to do it with `transform` yet. I should add a complete and reproducible example. – matanster Oct 08 '18 at 19:00