1

I have some code that help me to predic tsome missing values.This is the code

from datawig import SimpleImputer
from datawig.utils import random_split
from sklearn.metrics import f1_score, classification_report
df_train, df_test = random_split(df, split_ratios=[0.8, 0.2])
# Initialize a SimpleImputer model
imputer = SimpleImputer(
input_columns=['SITUACION_DNI_A'],  # columns containing information about 
 the column we want to impute
output_column='EXTRANJERO_A',  # the column we'd like to impute values for
output_path='imputer_model'  # stores model data and metrics
)

# Fit an imputer model on the train data
imputer.fit(train_df=df_train, num_epochs=10)

# Impute missing values and return original dataframe with predictions
predictions = imputer.predict(df_test)

After that i get a new dataframe with less rows than the original, how can i insert the values that i get in the prediction into my original dataframe, or there's is a way to run the code with all my dataframe and not the test

Mecha
  • 79
  • 8

1 Answers1

1

If both the dataframe have a unique column or something that can act like an ID, then this method will work

df_test = df_test.set_index('unique_col')
df_test.fillna(predictions.set_index('unique_col'))

If the above method does not work, then drop the rows with that missing values and append the imputer predictions to the dataframe. look the following links for help

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Delete rows if there are null values in a specific column in Pandas dataframe

secretive
  • 2,032
  • 7
  • 16