0

Say I have a selection of rows from dataframe stored in a variable errorData. When I display this variable, the correct rows are shown (i.e. selection appears to be valid). My goal is to replace only the rows that match the criteria in my variable to np.nan

errorData = df.loc[(df['Percent'] == 100) &\
                  (df['Rating1'] != 8) &\
                  (df['Rating2'] != 1)&\
                  (df['Grade'] == "NG")]

for i in errorData:
        df['Percent'].replace(df['Percent']==100, np.nan,inplace=True)

However, this doesn't appear to be working. Whenever I report the percent column again after performing this operation, values with 100 were also removed from

df['Grade'] == "B"

I've tried a couple of other ways too, like:

for i in errorData:
        df['Percent'].replace(100, np.nan,inplace=True)

But again, to no avail. Sorry I haven't posted sample rows here. I've seen that done on other questions but I'm not entirely sure on the formatting of that.

Apologies in advance for any errors in the above.

Update: For more clarification if I execute

df.loc[(df['Percent'] == 100) &\
                  (df['Rating1'] != 8) &\
                  (df['Rating2'] != 1)&\
                  (df['Grade'] == "NG")].shape

It returned (129,8) -- i.e. my valid rows.

And if I perform

df['Percent'].isnull().sum()

Before the change, I'll receive 0, but after the change I'll receive 400. This means it's not only editing the rows selected in my variable erroneousData and I cannot see why.

Karim
  • 271
  • 2
  • 11

2 Answers2

0

I've never answered my own question before! But I found the answer here:

Selecting with complex criteria from pandas.DataFrame

For anyone wondering what the solution was, the code format from the first response in that question worked in my situation:

df.loc[(df["Percent"] == 100) & (df["Rating1"] != 8) &\
      (df["Rating2"] != 1) & (df['Grade'] == "NG"), "Percent"] = np.nan
Karim
  • 271
  • 2
  • 11
0

With this:

df.loc[(df['Percent'] == 100) &\
       (df['Rating1'] != 8) &\
       (df['Rating2'] != 1)&\
       (df['Grade'] == "NG")]

You're selecting all columns from rows that match this conditions.

Since the changes wil only take place on Percent column, you should pass it into .loc. This way, you can set it directly.

df.loc[(df['Percent'] == 100) &\
       (df['Rating1'] != 8) &\
       (df['Rating2'] != 1)&\
       (df['Grade'] == "NG"), 'Percent'] = np.nan
jmiguel
  • 349
  • 1
  • 9