0

I wanted to add a new column to a pandas df while iterating:

for index, row in df.iterrows():
    row["newcolumn"] = row["oldcolumn"].normalize() #normalize() is a custom function

This, however, leaves my df unchnged. Why is this?

lte__
  • 7,175
  • 25
  • 74
  • 131
  • 2
    See Note 2 [here](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.iterrows.html): *You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.* – jedwards Jun 08 '18 at 08:26
  • 1
    why would you want to add a column whilst iterating row-wise? You could just do `df['newcolumn'] = df['oldcolumn'].normalize()` – EdChum Jun 08 '18 at 08:29
  • Seems like an XY problem. Please explain the rationale with a minimal (but complete) example and we may be able to advise a better workflow. – jpp Jun 08 '18 at 08:29

1 Answers1

1

Use loc with df:

for index, row in df.iterrows():
    df.loc[index, "newcolumn"] = row["oldcolumn"].normalize() 

But for better performance is better use apply if does not exist some vectorized solution:

df["newcolumn"] = df["oldcolumn"].apply(normalize)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252