0

I was trying to do something like this: qc_data is my dataframe (pandas ), PH is one of the values in column NEW_QC_TEST. How do I do this correctly?

qc_data[qc_data['NEW_QC_TEST'] == "PH"]["VALUE"].fillna(6.9, inplace=True)

I did try the following but is even worse I think

#qc_data[qc_data['NEW_QC_TEST'] == "PH"].isnull().sum()
#qc_data[qc_data['NEW_QC_TEST'] == "PH"].fillna(6.9, inplace=True)
DobbyTheElf
  • 604
  • 6
  • 21

2 Answers2

0

Use a filter like this:

import pandas as pd
import numpy as np
qc_data = pd.DataFrame(data={'NEW_QC_TEST': [1, "PH"], 'VALUE': [2, np.nan]})
qc_data.loc[(qc_data['NEW_QC_TEST'] == "PH") & qc_data['VALUE'].isna(), "VALUE"] = 6.9
print(qc_data)

Which produces the following:

  NEW_QC_TEST  VALUE
0           1    2.0
1          PH    6.9
DobbyTheElf
  • 604
  • 6
  • 21
0

Most intuitively, you would expect this to work:

❌ Wrong approach:

qc_data[qc_data['NEW_QC_TEST'] == "PH"]["VALUE"].fillna(6.9, inplace=True)

But it does not work because of chained-indexing, I won't dive into the details, but in short, this returns a copy on which fillna acts in place, so original data stays intact. See links below for more explanations.

✅ Right approach:

Select the part you need and assign fillna results into it:

rows_mask = qc_data['NEW_QC_TEST'] == "PH"
qc_data.loc[rows_mask, "VALUE"] = qc_data.loc[rows_mask, "VALUE"].fillna(6.9)

More explanations

vvv444
  • 2,764
  • 1
  • 14
  • 25