I have a pandas
data frame, df
, that has second-to-second data (Longitude, Latitude, etc.) for each driver. The data frame consists of several trips. There is a feature called Event_Type
that can be used to determine the start and end of trips:
ignitionOnList = df[df['Event_Type'] == 'Ignition On'].index.tolist()
ignitionOffList = df[df['Event_Type'] == 'Ignition Off'].index.tolist()
So, imagine I have 5 trips in this data frame. the length of ignitionOnList
and ignitionOffList
would be 5. I'd like to do analysis on each trip specifically and store them in a pandas
data frame. Here's what I do:
dfTrips = pd.DataFrame({'Date' : [],'Vehicle' : [], 'Trip_Number' : [], 'Start_Time' : [], 'Duration' : [],
'Collision': [],'Harsh_Steering' : [], 'Harsh_Deceleration' : [], 'Harsh_Acceleration' : [],
'Harsh_Preferred_Speed' : []})
tripCount = -1
tripNumbers = len(ignitionOnList)
for tripNumber in range(tripNumbers):
tripCount += 1
dfTemp = df.loc[ignitionOnList[tripNumber]:ignitionOffList[tripNumber]+1]
# Doing stuff to this temporary data frame and storing them, for example:
dfTrips.loc[tripCount,'Start_Time'] = dfTemp.loc[0,'Time'].strftime("%H:%M:%S")
dfTrips.loc[tripCount,'Finish_Time'] = dfTemp.loc[dfTemp.shape[0]-1,'Time'].strftime("%H:%M:%S")
# Using a function I have defined named `get_steering_risk` to get risky behaviour for each trip
dfTrips.loc[tripCount,'Harsh_Deceleration'] = get_deceleration_risk(dfTemp)
dfTrips.loc[tripCount,'Harsh_Steering'] = get_steering_risk(dfTemp)
This works. But I am guessing there are better ways to do this in Python without for loops. I am not sure I can simply use apply
because I am not applying the same function to the whole data frame.
An alternative might be to redefine the functions so that they produce a column in df
and apply them to the whole data frame, and then aggregating the results for each trip. For example, get_steering_risk
function can be defined to make 0
or 1
for each second in df
and then the percentage of 1
s for each trip would be Harsh_Steering
in dfTrips
. However, some functions cannot be applied on the whole data frame. For example, one function regresses the velocity versus acceleration and it should be done trip by trip. What is the best way to approach this? Thanks.