Why can't I add new column while iterating a pandas dataframe?

Question

I wanted to add a new column to a pandas df while iterating:

for index, row in df.iterrows():
    row["newcolumn"] = row["oldcolumn"].normalize() #normalize() is a custom function

This, however, leaves my df unchnged. Why is this?

See Note 2 [here](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.iterrows.html): *You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.* — jedwards, Jun 08 '18 at 08:26
why would you want to add a column whilst iterating row-wise? You could just do `df['newcolumn'] = df['oldcolumn'].normalize()` — EdChum, Jun 08 '18 at 08:29
Seems like an XY problem. Please explain the rationale with a minimal (but complete) example and we may be able to advise a better workflow. — jpp, Jun 08 '18 at 08:29

jezrael · Answer 1 · 2018-06-08T08:33:00.547

1

Use loc with df:

for index, row in df.iterrows():
    df.loc[index, "newcolumn"] = row["oldcolumn"].normalize()

But for better performance is better use apply if does not exist some vectorized solution:

df["newcolumn"] = df["oldcolumn"].apply(normalize)

edited Jun 08 '18 at 08:33

answered Jun 08 '18 at 08:26

jezrael

1 Answers1