here's a sample of my dataset
side | serial_number | inspector | date_1 | date_2
top | 10 | Paul | 4/1/18 13:21 | 4/1/18 14:22
bot | 10 | Jack | 4/1/18 13:01 | 4/1/18 14:22
bot | 11 | Jack | 4/1/18 14:01 | 4/1/18 14:53
top | 11 | Paul | 4/1/18 14:25 | 4/1/18 14:53
top | 12 | Henry | 4/1/18 14:25 | 4/1/18 14:58
For each unique tuple (serial_number, date_2), I want to keep the row where date_1 is minimum and keep every column, so that eventually my dataset looks like this :
side | serial_number | inspector | date_1 | date_2
bot | 10 | Jack | 4/1/18 13:01 | 4/1/18 14:22
bot | 11 | Jack | 4/1/18 14:01 | 4/1/18 14:53
top | 12 | Henry | 4/1/18 14:25 | 4/1/18 14:58
To do so, my current code looks like this :
import pandas as pd
df = pd.read_csv("data.csv") #getting the data in a pandas dataframe
df_sorted = df.groupby(['serial_number','date_2'], sort=False)['date_1'].min()
df_sorted .to_csv("data_sorted.csv")
So in the end, I got the right dataset but columns I'm not grouping by are missing. Here's the resulting dataset :
serial_number | date_1 | date_2
10 | 4/1/18 13:01 | 4/1/18 14:22
11 | 4/1/18 14:01 | 4/1/18 14:53
12 | 4/1/18 14:25 | 4/1/18 14:58
How do I keep all columns ? Thank you.