4

I cannot figure out how to use the index results from np.where in a for loop. I want to use this for loop to ONLY change the values of a column given the np.where index results.

This is a hypothetical example for a situation where I want to find the indexed location of certain problems or anomalies in my dataset, grab their locations with np.where, and then run a loop on the dataframe to recode them as NaN, while leaving every other index untouched.

Here is my simple code attempt so far:

import pandas as pd
import numpy as np

# import iris
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/iris.csv')

# conditional np.where -- hypothetical problem data
find_error = np.where((df['petal_length'] == 1.6) & 
                  (df['petal_width'] == 0.2))

# loop over column to change error into NA
for i in enumerate(find_error):
    df = df['species'].replace({'setosa': np.nan})

# df[i] is a problem but I cannot figure out how to get around this or an alternative
cs95
  • 379,657
  • 97
  • 704
  • 746
John Stud
  • 1,506
  • 23
  • 46

1 Answers1

6

You can directly assign to the column:

m = (df['petal_length'] == 1.6) & (df['petal_width'] == 0.2)
df.loc[m, 'species'] = np.nan

Or, fixing your code.

df['species'] = np.where(m, np.nan, df['species'])

Or, using Series.mask:

df['species'] = df['species'].mask(m)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks! This is really great! Any stabs on a loop (I am trying to get better at them but I am really bad at loops). – John Stud Jan 30 '19 at 22:32
  • 2
    @JohnStud There are cases when loops are useful, but generally do not recommend their use on numeric data (especially when vectorised methods exist). Loops are good for string/regex operations. I have a detailed writeup on that here: [For loops with pandas - When should I care?](https://stackoverflow.com/questions/54028199/for-loops-with-pandas-when-should-i-care) – cs95 Jan 30 '19 at 22:33
  • Thanks again! Much appreciated help! – John Stud Jan 30 '19 at 22:35
  • Actually.. I am getting errors on every single one of these suggestions! – John Stud Jan 30 '19 at 23:16
  • @JohnStud Okay, that's not particularly... helpful. What does the error say? Please provide the error message as well. – cs95 Jan 30 '19 at 23:17
  • Sorry, I am not sure what the issue was. I just re-ran everything and got it to work. The first issue was; AttributeError: 'float' object has no attribute 'loc' – John Stud Jan 30 '19 at 23:46
  • @JohnStud You must've mistakenly reassigned `df` to something else (not a DataFrame, at least). – cs95 Jan 30 '19 at 23:55