I am having trouble with getting distinct values from my dataframe.. Below is the code i currently use, in line 25(3rd of vier()) is the issue: I would like to show the top 10 fastest drivers based on their average heat(go-kart heat) time.
Input:
HeatNumber,NumberOfKarts,KartNumber,DriverName,Laptime
334,11,5,Monique,00:53.862
334,11,5,Monique,00:59.070
334,11,5,Monique,00:47.832
334,11,5,Monique,00:47.213
334,11,5,Monique,00:51.975
334,11,5,Monique,00:46.423
334,11,5,Monique,00:49.539
334,11,5,Monique,00:49.935
334,11,5,Monique,00:45.267
334,11,12,Robert-Jan,00:55.606
334,11,12,Robert-Jan,00:52.249
334,11,12,Robert-Jan,00:50.965
334,11,12,Robert-Jan,00:53.878
334,11,12,Robert-Jan,00:48.802
334,11,12,Robert-Jan,00:48.766
334,11,12,Robert-Jan,00:46.003
334,11,12,Robert-Jan,00:46.257
334,11,12,Robert-Jan,00:47.334
334,11,20,Katja,00:56.222
334,11,20,Katja,01:01.005
334,11,20,Katja,00:50.296
334,11,20,Katja,00:48.004
334,11,20,Katja,00:51.203
334,11,20,Katja,00:47.672
334,11,20,Katja,00:50.243
334,11,20,Katja,00:50.453
334,11,20,Katja,01:06.192
334,11,13,Bensu,00:56.332
334,11,13,Bensu,00:54.550
334,11,13,Bensu,00:52.023
334,11,13,Bensu,00:52.518
334,11,13,Bensu,00:50.738
334,11,13,Bensu,00:50.359
334,11,13,Bensu,00:49.307
334,11,13,Bensu,00:49.595
334,11,13,Bensu,00:50.504
334,11,17,Marit,00:56.740
334,11,17,Marit,00:52.534
334,11,17,Marit,00:48.331
334,11,17,Marit,00:56.204
334,11,17,Marit,00:49.066
334,11,17,Marit,00:49.210
334,11,17,Marit,00:45.655
334,11,17,Marit,00:46.261
334,11,17,Marit,00:46.837
334,11,11,Niels,00:58.518
334,11,11,Niels,01:01.562
334,11,11,Niels,00:51.238
334,11,11,Niels,00:48.808
Code:
import pandas as pd
import matplotlib.pyplot as plt
#Data
df = pd.read_csv('dataset_kartanalyser.csv')
df = df.dropna(axis=0, how='any')
df = df.join(df['Laptime'].str.split(':', 1, expand=True).rename(columns={0:'M', 1:'S'}))
df['M'] = df['M'].astype(int)
df['S'] = df['S'].astype(float)
df['Laptime'] = (df['M'] * 60) + df['S']
df.drop(['M', 'S'], axis=1, inplace=True)
#Funties
def twee():
print("Het totaal aantal karts = " + str(df['KartNumber'].nunique()))
print("Het aantal unique drivers = " + str(df['DriverName'].nunique()))
print("Het totaal aantal heats = " + str(df['HeatNumber'].nunique()))
def drie():
print("De 10 snelste Drivers obv individuele tijd zijn: ")
print((df.groupby('DriverName')['Laptime'].nsmallest(1)).nsmallest(10))
def vier():
print('De 10 snelste Drivers obv snelste heat gemiddelde:')
print((df.groupby(['DriverName', 'HeatNumber'])['Laptime'].mean().round(3)).nsmallest(10))
print(df)
HeatNumber NumberOfKarts KartNumber DriverName Laptime
0 334 11 5 Monique 53.862
1 334 11 5 Monique 59.070
2 334 11 5 Monique 47.832
3 334 11 5 Monique 47.213
4 334 11 5 Monique 51.975
... ... ... ... ... ...
4053 437 2 20 luuk 39.678
4054 437 2 20 luuk 39.872
4055 437 2 20 luuk 39.454
4056 437 2 20 luuk 39.575
4057 437 2 20 luuk 39.648
Output:
DriverName HeatNumber
giovanni 411 26.233
ryan 411 27.747
giovanni 408 27.938
papa 394 28.075
guus 406 28.998
Rob 427 29.371
Suus 427 29.416
Jan-jullius 394 29.428
Joep 427 29.934
Indy 423 29.991
The output i get i almost correct, expect that the driver "giovanni" occurs twice. I would like to only show the fastest avg heat time for each driver. Anyone who know how to do this?