2

I have a dataframe with cca 4 milion rows and 19 columns. I used groupby to group my data by DateTime and while i aggregated DateTime on 1 minute. For aggregation of other parameters i used mean().

Dataframe looks similar to this:

     DateTime                SerialNumber GpsLongitude GpsLatitude  Hours   Fuel   DriverStatus
0   2019-09-19 12:56:35+00:00   C0000000    70.175616   25.976910   307.85  12.67   Automatic
1   2019-09-19 12:57:05+00:00   C0000000    70.175570   25.977966   307.86  12.53   Automatic
2   2019-09-19 12:56:50+00:00   C0000000    70.175601   25.977438   307.86  12.65   off
3   2019-09-19 12:59:50+00:00   C0000000    70.175605   25.977438   307.87  12.65   off

I used the following code:

 new_df = df.groupby(np.array(df['DateTime'], dtype='datetime64[m]'), as_index=True, sort=True).agg({'GpsLongitude':'median',
             'GpsLatitude':'median',
             'Hours' : 'mean',
             'Fuel' : 'mean',
             'Engine_rpm' : 'mean',
             'EngineLoad' : 'mean',
             'FuelConsumption_l_h' : 'mean',
             'SpeedGearbox_km_h' : 'mean',
             'SpeedRadar_km_h' : 'mean',
             'DriverStatus' : lambda x: ','.join(x.astype(str)),
             'SerialNumber': lambda x: ','.join(x.astype(str))}).reindex(['GpsLongitude', 'GpsLatitude', 'TotalWorkingHours',
                      'GroundSpeed_km_h',  'Engine_rpm', 'EngineLoad_perc', 'Drum_rpm', 'Fan_rpm',  'SerialNumber'], axis=1)
print (new_df)

The problem is, after running the code, i get the following message:

SpecificationError: nested renamer is not supported

Then i made a testing table with different columns and less rows, used exactly the same code and it is working! Any suggestions?

Edit: I also tried WITHOUT DICTIONARY like this

new_df = df.groupby(np.array(df['DateTime'], dtype='datetime64[m]'), as_index=False, sort=False).agg(GpsLongitude ='median',
             GpsLatitude='median',
             TotalWorkingHours = 'mean',
             GroundSpeed_km_h = 'mean',
             Engine_rpm = 'mean',
             EngineLoad_perc = 'mean',
             Drum_rpm = 'mean',
             Fan_rpm = 'mean',           
             SerialNumber= lambda x: ','.join(x.astype(str)))
print (new_df )

but i got:

TypeError: Must provide 'func' or tuples of '(column, aggfunc).
KayEss
  • 419
  • 4
  • 18

0 Answers0