I have a large data set and I need to randomly sample a smaller data set from it. the first column consists of different vehicle IDs and sampling should be done out of these vehicle IDs. each vehicle has more than one record, so I have multiple rows from one vehicle. The put the code that I am using down but this takes a lot of time to run. I was wondering if there is any faster way of doing so.
Example:
df:
vehicle_ID SectionID time
1 200 00:00:03
100 237 00:00:03
1 1872 00:00:06
Code
veh = df['vehicle_ID'].unique()
sample = random.sample(list(veh), 12900)
ndf = pd.DataFrame()
for i in sample:
new = df[df['vehicle_ID']==i]
ndf=ndf.append(new , ignore_index =True)