This is a slow solution for what I am hoping to achieve. The problem is performance. Is there a more 'pandonic' way to achieve this without the user defined function? The goal is to keep only all rows that are of the first timestamp that occurs in each group.
def get_first_id_time(df):
first_time = df['datetime'][0]
df = df.loc[df['datetime']==first_time]
return df
data = data.groupby('id').apply(get_first_id_time)
EDIT: Note, there are many rows with datetime=first_time, for each group.