convert pandas groupby object to dataframe while preserving group semantics

Question

I have miserably failed extrapolating from any answers I have found for grouping a dataframe, and then merging back the group semantics computed by groupby into the original dataframe. Seems documentation is lacking and SO answers are not applicable to current pandas versions.

This code:

grouped = df.groupby(pd.Grouper(
            key = my_time_column, 
            freq = '15Min', 
            label='left', 
            sort=True)).apply(pd.DataFrame)

Yields back a dataframe, but I have found no way of making the transition to a dataframe having the same data as the original df, while also populating a new column with the start datetime, of the group that each row belonged to in the groupby object.

Here's my current hack that solves it:

grouped = df.groupby(pd.Grouper(
            key = my_datetime_column, 
            freq = '15Min', 
            label='left', 
            sort=True))

sorted_df = grouped.apply(pd.DataFrame)

interval_starts = []
for group_idx, group_member_indices in grouped.indices.items():
    for group_member_index in group_member_indices:
        interval_starts.append(group_idx)

sorted_df['interval_group_start'] = interval_starts

Wondering if there's an elegant pandas way.

pandas version: 0.23.0

You want to use `groupby` + `transform` to return a like-indexed Series. In this case, probably sort by the time column and transform `.first` to broadcast the result back to every group member. If you provide a [mcve] with sample data, I'm sure someone will provide the exact code. — ALollz, Oct 08 '18 at 17:37
You are looking for `transform()`: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html — rahlf23, Oct 08 '18 at 17:43

PMende · Answer 1 · 2018-10-08T17:44:07.537

1

IIUC, this should do what you're looking for:

grouped = df.groupby(pd.Grouper(key=my_time_column, 
                                freq = '15Min', 
                                label='left', 
                                sort=True))\
            .apply(pd.DataFrame)
grouped['start'] = grouped.loc[:, my_time_column] \
                          .groupby(level=0) \
                          .transform('min')

edited Oct 08 '18 at 17:44

answered Oct 08 '18 at 17:37

PMende

5,171
2
19
26

Sorry for misleading you maybe. I've added code that solves it to the question, basically my code included in the question is fast enough. unsure how to do it with `transform` yet. I should add a complete and reproducible example. – matanster Oct 08 '18 at 19:00

convert pandas groupby object to dataframe while preserving group semantics

1 Answers1