Pandas: Resampling Hourly Data for each Group

Question

I have a dataframe that conains gps locations of vehicles recieved at various times in a day. For each vehicle, I want to resample hourly data such that I have the median report (according to the time stamp) for each hour of the day. For hours where there are no corresponding rows, I want a blank row. I am using the following code:

for i,j in enumerate(list(df.id.unique())):
        data=df.loc[df.id==j]        
        data['hour']=data['timestamp'].hour
        data_grouped=data.groupby(['imo','hour']).median().reset_index()
        data = data_grouped.set_index('hour').reindex(idx).reset_index() #idx is a list of integers from 0 to 23.

Since my dataframe has millions of id's it takes me a lot of time to iterate though all of them. Is there an efficient way of doing this?

Unlike Pandas reindex dates in Groupby, I have multiple rows for each hour, in addition to some hours having no rows at all.

Can you add some sample data? – jezrael Sep 22 '20 at 05:54 — jezrael, Sep 22 '20 at 05:54

score 1 · Accepted Answer · answered Sep 22 '20 at 06:15

Tested in last version of pandas, convert hour column to categoricals with all possible categories and then aggregate without loop:

df['hour'] = pd.Categorical(df['timestamp'].dt.hour, categories=range(24))
df1 = df.groupby(['id','imo','hour']).median().reset_index()

Pandas: Resampling Hourly Data for each Group

1 Answers1