I have a dataframe with cca 4 milion rows and 19 columns. I used groupby
to group my data by DateTime and while i aggregated
DateTime on 1 minute. For aggregation of other parameters i used mean()
.
Dataframe looks similar to this:
DateTime SerialNumber GpsLongitude GpsLatitude Hours Fuel DriverStatus
0 2019-09-19 12:56:35+00:00 C0000000 70.175616 25.976910 307.85 12.67 Automatic
1 2019-09-19 12:57:05+00:00 C0000000 70.175570 25.977966 307.86 12.53 Automatic
2 2019-09-19 12:56:50+00:00 C0000000 70.175601 25.977438 307.86 12.65 off
3 2019-09-19 12:59:50+00:00 C0000000 70.175605 25.977438 307.87 12.65 off
I used the following code:
new_df = df.groupby(np.array(df['DateTime'], dtype='datetime64[m]'), as_index=True, sort=True).agg({'GpsLongitude':'median',
'GpsLatitude':'median',
'Hours' : 'mean',
'Fuel' : 'mean',
'Engine_rpm' : 'mean',
'EngineLoad' : 'mean',
'FuelConsumption_l_h' : 'mean',
'SpeedGearbox_km_h' : 'mean',
'SpeedRadar_km_h' : 'mean',
'DriverStatus' : lambda x: ','.join(x.astype(str)),
'SerialNumber': lambda x: ','.join(x.astype(str))}).reindex(['GpsLongitude', 'GpsLatitude', 'TotalWorkingHours',
'GroundSpeed_km_h', 'Engine_rpm', 'EngineLoad_perc', 'Drum_rpm', 'Fan_rpm', 'SerialNumber'], axis=1)
print (new_df)
The problem is, after running the code, i get the following message:
SpecificationError: nested renamer is not supported
Then i made a testing table with different columns and less rows, used exactly the same code and it is working! Any suggestions?
Edit: I also tried WITHOUT DICTIONARY like this
new_df = df.groupby(np.array(df['DateTime'], dtype='datetime64[m]'), as_index=False, sort=False).agg(GpsLongitude ='median',
GpsLatitude='median',
TotalWorkingHours = 'mean',
GroundSpeed_km_h = 'mean',
Engine_rpm = 'mean',
EngineLoad_perc = 'mean',
Drum_rpm = 'mean',
Fan_rpm = 'mean',
SerialNumber= lambda x: ','.join(x.astype(str)))
print (new_df )
but i got:
TypeError: Must provide 'func' or tuples of '(column, aggfunc).