2

Assume I have a pandas.DataFrame, say

In [1]: df = pd.DataFrame([['a', 'x'], ['b', 'y'], ['c', 'z']],  
                          index=[10, 20, 30],  
                          columns=['first', 'second'])

In [2]: df
Out[2]:
   first second
10     a      x
20     b      y
30     c      z

and I want to update the first two entries of the first column with the corresponding entries of the second column. First I tried

to_change = df.index <= 20
df[to_change]['first'] = df[to_change]['second']

but this does not work. However,

df['first'][to_change] = df['second'][to_change]

works fine.

Can anyone explain? What is the rational behind this behavior? Although I use pandas a lot I find these kind of issues make it sometimes hard to predict what a particular piece of pandas code will actually do. Maybe someone can provide an insight that helps me to improve my mental model of the inner workings of pandas.

levzettelin
  • 2,600
  • 19
  • 32
  • 3
    It has to do with getting a `view` or a `copy` in return. See for example the discussion in this post: http://stackoverflow.com/questions/14192741/understanding-pandas-dataframe-indexing – Rutger Kassies Nov 07 '13 at 10:10
  • Yes, there is the answer. So the question is duplicate. Thank you. – levzettelin Nov 07 '13 at 10:20

1 Answers1

2

In master/0.13 (releasing very shortly)

This will now warn (controllable by an option to raise/ignore) that you are modifying a copy

In [1]: df = pd.DataFrame([['a', 'x'], ['b', 'y'], ['c', 'z']],  
   ...:                           index=[10, 20, 30],  
   ...:                           columns=['first', 'second'])

In [2]: df
Out[2]: 
   first second
10     a      x
20     b      y
30     c      z

In [3]: to_change = df.index <= 20

In [4]: df[to_change]['first'] = df[to_change]['second']
pandas/core/generic.py:1008: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  warnings.warn(t,SettingWithCopyWarning)

In [5]: df['first'][to_change] = df['second'][to_change]
Jeff
  • 125,376
  • 21
  • 220
  • 187