Persistence problems when using iterrows()

Question

As I believe someone also reported in this thread, filling in a dataframe using iterrows() can result in persistence problems. E.g. something as simple as:

my_dataframe = pd.DataFrame(np.NaN, index = xrange(5),columns=['foo',  'bar'])

for ix, row in my_dataframe.iterrows():
  row['foo'] = 'Hello'

results in no changes to the dataframe:

> my_dataframe
    foo  bar
0   NaN  NaN
1   NaN  NaN
2   NaN  NaN
3   NaN  NaN
4   NaN  NaN

And I got no warnings, no exceptions, etc. Is this intended? Is it a bug? Intended? What exactly is happening?

The above is with the latest stable version of Pandas, 0.13.1.

What is your use case here, usually you can avoid iterating rows. — Andy Hayden, Mar 10 '14 at 22:01
Thank you @Andy - My computation is both row and group-specific (i.e. the column in question captures a comparison of the row in relation to a group). More specifically, each row gets a weight that is the linear interpolation between the min and max of value of the group (on some other column). So my current workflow is: First group the data into partitions, and then iterate through each row computing the weight for each row. That said, you are probably right - there may be a way of doing this without iterating — Amelio Vazquez-Reina, Mar 10 '14 at 22:14
Sounds tricky... but possible. Perhaps worth asking a question about how to do it if you can come up with a toy example / desired result :) — Andy Hayden, Mar 10 '14 at 22:32

Andy Hayden · Accepted Answer · 2014-03-10T22:06:11.650

You're changing the type of the row, and so it's modifying a copy.

Something keeping the dtype would have worked in this case:

In [11]: for ix, row in my_dataframe.iterrows():
   ....:       row['foo'] = 1

This behaviour isn't guaranteed, it's much better to do assignment using loc or assigning the column directly:

In [12]: row['foo'] = 'Hello'  # works

In [13]: row.loc[:, 'foo'] = 'Hello'  # works

see returning a view vs a copy in the docs.

I should add that you can do this by assigning to the original frame (using loc/ix), however you can (and should) usually avoid this by vectorising your solutions rather than iterating over each row:

for ix, row in my_dataframe.iterrows():
      my_dataframe.ix[ix, 'foo'] = 'Hello'  # works

Persistence problems when using iterrows()

1 Answers1