8

What is the best way to do iterrows with a subset of a DataFrame?

Let's take the following simple example:

import pandas as pd

df = pd.DataFrame({
  'Product': list('AAAABBAA'),
  'Quantity': [5,2,5,10,1,5,2,3],
  'Start' : [
      DT.datetime(2013,1,1,9,0),
      DT.datetime(2013,1,1,8,5),
      DT.datetime(2013,2,5,14,0),
      DT.datetime(2013,2,5,16,0),
      DT.datetime(2013,2,8,20,0),                                      
      DT.datetime(2013,2,8,16,50),
      DT.datetime(2013,2,8,7,0),
      DT.datetime(2013,7,4,8,0)]})

df = df.set_index(['Start'])

Now I would like to modify a subset of this DataFrame using the itterrows function, e.g.:

for i, row_i in df[df.Product == 'A'].iterrows():
    row_i['Product'] = 'A1' # actually a more complex calculation

However, the changes do not persist.

Is there any possibility (except a manual lookup using the index 'i') to make persistent changes on the original Dataframe ?

smci
  • 32,567
  • 20
  • 113
  • 146
Andy
  • 9,483
  • 12
  • 38
  • 39
  • Are you trying to apply a function to each row by taking arguments from different columns? This has already been [answered here](http://stackoverflow.com/questions/16353729/pandas-how-to-use-apply-function-to-multiple-columns). – dmvianna Feb 05 '15 at 04:46

2 Answers2

2

Why do you need iterrows() for this? I think it's always preferrable to use vectorized operations in pandas (or numpy):

df.ix[df['Product'] == 'A', "Product"] = 'A1'
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • 4
    Thanks for your comment. This is a simple example, my actual use case is more complex and I need to use iterrows in that – Andy Oct 29 '13 at 18:32
  • @Andy: then you want to make that clear in your question – smci Feb 05 '15 at 04:19
0

I guess the best way that comes to my mind is to generate a new vector with the desired result, where you can loop all you want and then reassign it back to the column

#make a copy of the column
P = df.Product.copy()
#do the operation or loop if you really must
P[ P=="A" ] = "A1"
#reassign to original df
df["Product"] = P
Magellan88
  • 2,543
  • 3
  • 24
  • 36