0

There is a dataset of vehicles by type (sedan, SUV, truck, etc), odometer, cylinders, price, etc. I am addressing the missing values in the column 'cylinders', which contains the number of cylinders in the engine of the vehicle. My approach to fill in the missing values is to use the median number of cylinders per type of vehicle. Using a pivot table it looks like this: Screenshot of the pivot table

Now I want to create a for loop that goes through every row and when it finds a NaN value in column 'cylinders' replaces it with the median value seen in the pivot table according to the type.

Thanks

  • 1
    `df['column'].fillna(df['second_column'],inplace=True)` – Ameya Jun 30 '22 at 16:37
  • Welcome to stack overflow. We ask that questions include sample input and expected output in the body of the question as text, not as an image or link, along with enough code to make a [mcve]. PLease take a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question with this information so that we can offer better answers – G. Anderson Jun 30 '22 at 16:54

1 Answers1

0

So there you have a for loop that goes through every row in your cars dataframe and when it finds a NaN value its gonna look in your pivot_table and will replace the NaN with the Cylinders value of that particular car type.

for index, row in cars_table.iterrows():
   if pd.isnull(row['Cylinders']):
     pivot_table_index = pivot_table.index.get_loc(row['Type'])
     cars_table.loc[index, 'Cylinders'] = pivot_table['Cylinders'][pivot_table_index]
Grg Alx
  • 83
  • 7