1

i have a df with MachineType, Prod/RT, and several other columns. MachineType contains either TRUE or FALSE. need to .fillna and .replace but in different ways for MachineType. (filling values are different for TRUE and FALSE)

Dataframe : updatedDf

my code do above calc:

updatedDf['Prod/RT']=updatedDf[updatedDf['MachineType']==True]['Prod/RT'].replace(np.inf,0.021660)
updatedDf['Prod/RT']=updatedDf[updatedDf['MachineType']==True]['Prod/RT'].fillna(0.021660)


updatedDf['Prod/RT']=updatedDf[updatedDf['MachineType']==False]['Prod/RT'].replace(np.inf,0.050261)
updatedDf['Prod/RT']=updatedDf[updatedDf['MachineType']==False]['Prod/RT'].fillna(0.050261)

But my code gives an unexpected output with Nan values. Is there any way to fix this error?or can't we .fillna and .replace like above way?

enter image description here

johnson
  • 379
  • 2
  • 17

1 Answers1

1

My approach to your problem is to wrap the filling and replacing in a function and use it as parameter in pandas .apply(). Using you approach will require the usage of .loc[].

updatedDf = pd.DataFrame({
    'MachineType' : np.random.choice([True, False], 10, True),
    'Prod/RT' : np.random.choice([np.nan, np.inf, random.random()], 10, True)
})

# solution 1
prod_RT_dict = {True:0.21660, False:0.050261}
def fillProd_RT(row):
    if row['Prod/RT'] != np.inf and pd.notna(row['Prod/RT']):
        return row['Prod/RT']
    else:
        return prod_RT_dict[row['MachineType']]
updatedDf['Prod/RT_2'] = updatedDf.apply(fillProd_RT, axis=1)

# solution 2
updatedDf['Prod/RT_3']=updatedDf['Prod/RT'].replace(np.inf,np.nan)
updatedDf.loc[updatedDf['MachineType']==True,'Prod/RT_3']=updatedDf\
    .loc[updatedDf['MachineType']==True,'Prod/RT_3'].fillna(0.021660)
updatedDf.loc[updatedDf['MachineType']==False,'Prod/RT_3']=updatedDf\
    .loc[updatedDf['MachineType']==False,'Prod/RT_3'].fillna(0.050261)

updatedDf

enter image description here

Gusti Adli
  • 1,225
  • 4
  • 13
  • updatedDf\ .loc[updatedDf['MachineType']==False,'Prod/RT_3'].fillna(0.050261) what does \ meaning after updatedDf? – johnson May 03 '21 at 18:07
  • 1
    `\` is a wrapper for long one line code. I added it because I didn't want to write those line in one long line. If you don't understand my explanation, read [Breaking a line of python to multiple lines?](https://stackoverflow.com/questions/14417571/breaking-a-line-of-python-to-multiple-lines) – Gusti Adli May 03 '21 at 18:13