I am attempting to replace the NaN in the 'cylinders' column using a dictionary with the median model cylinder number. I think it should work easily but I have been getting stuck each way I try.
cylinders_model_med = df.groupby('model').agg({'cylinders': 'median'})
cylinders_model_med=cylinders_model_med.to_dict()
#output:
'bmw x5': 6.0,
'buick enclave': 6.0,
'cadillac escalade': 8.0,
'chevrolet camaro': 6.0,
'chevrolet camaro lt coupe 2d': 6.0,
'chevrolet colorado': 5.0,
'chevrolet corvette': 8.0,
'chevrolet cruze': 4.0,
'chevrolet equinox': 4.0,
'chevrolet impala': 6.0,
'chevrolet malibu': 4.0,
'chevrolet silverado': 8.0,
'chevrolet silverado 1500': 8.0,
'chevrolet silverado 1500 crew': 8.0,
'chevrolet silverado 2500hd': 8.0,
'chevrolet silverado 3500hd': 8.0,
'chevrolet suburban': 8.0,
'chevrolet tahoe': 8.0,
'chevrolet trailblazer': 6.0,
'chevrolet traverse': 6.0,
'chrysler 200': 4.0,
'chrysler 300': 6.0,
'chrysler town & country': 6.0,
'dodge charger': 6.0,
'dodge dakota': 6.0,
'dodge grand caravan': 6.0,
'ford econoline': 8.0,
'ford edge': 6.0,
'ford escape': 4.0,
'ford expedition': 8.0,
'ford explorer': 6.0,
'ford f-150': 8.0,
'ford f-250': 8.0,
'ford f-250 sd': 8.0,
'ford f-250 super duty': 8.0,
'ford f-350 sd': 8.0,
'ford f150': 8.0,
'ford f150 supercrew cab xlt': 6.0,
'ford f250': 8.0,
'ford f250 super duty': 8.0,
'ford f350': 8.0,
'ford f350 super duty': 8.0,
'ford focus': 4.0,
'ford focus se': 4.0,
'ford fusion': 4.0,
'ford fusion se': 4.0,
'ford mustang': 6.0,
'ford mustang gt coupe 2d': 8.0,
'ford ranger': 6.0,
'ford taurus': 6.0,
'gmc acadia': 6.0,
'gmc sierra': 8.0,
'gmc sierra 1500': 8.0,
'gmc sierra 2500hd': 8.0,
'gmc yukon': 8.0,
'honda accord': 4.0,
'honda civic': 4.0,
'honda civic lx': 4.0,
'honda cr-v': 4.0,
'honda odyssey': 6.0,
'honda pilot': 6.0,
'hyundai elantra': 4.0,
'hyundai santa fe': 6.0,
'hyundai sonata': 4.0,
'jeep cherokee': 6.0,
'jeep grand cherokee': 6.0,
'jeep grand cherokee laredo': 6.0,
'jeep liberty': 6.0,
'jeep wrangler': 6.0,
'jeep wrangler unlimited': 6.0,
'kia sorento': 4.0,
'kia soul': 4.0,
'mercedes-benz benze sprinter 2500': 6.0,
'nissan altima': 4.0,
'nissan frontier': 6.0,
'nissan frontier crew cab sv': 6.0,
'nissan maxima': 6.0,
'nissan murano': 6.0,
'nissan rogue': 4.0,
'nissan sentra': 4.0,
'nissan versa': 4.0,
'ram 1500': 8.0,
'ram 2500': 6.0,
'ram 3500': 6.0,
'subaru forester': 4.0,
'subaru impreza': 4.0,
'subaru outback': 4.0,
'toyota 4runner': 6.0,
'toyota camry': 4.0,
'toyota camry le': 4.0,
'toyota corolla': 4.0,
'toyota highlander': 6.0,
'toyota prius': 4.0,
'toyota rav4': 4.0,
'toyota sienna': 6.0,
'toyota tacoma': 6.0,
'toyota tundra': 8.0,
'volkswagen jetta': 4.0,
'volkswagen passat': 4.0}}
#input:
df['cylinders']=df['cylinders'].fillna(cylinders_model_med)
df['cylinders'].isna().sum()
#output
5260
This is the same number of NaN I started with. I am new here so let me know if you need more (or less) information.
Thank you for your time!