10

I have a pandas dataframe with two columns: x and value. I want to find all the rows where x == 10, and for all these rows set value = 1,000. I tried the code below but I get the warning that

A value is trying to be set on a copy of a slice from a DataFrame.

I understand I can avoid this by using .loc or .ix, but I would first need to find the location or the indices of all the rows which meet my condition of x ==10. Is there a more direct way?

Thanks!

import numpy as np
import pandas as pd

df=pd.DataFrame()
df['x']=np.arange(10,14)
df['value']=np.arange(200,204)


print df

df[ df['x']== 10 ]['value'] = 1000 # this doesn't work

print df
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • 2
    Sorry what's wrong with using the recommended `df.loc[df['x'] == 10, 'value'] = 1000`? – EdChum Sep 02 '15 at 15:34
  • 1
    Thanks, I hadn't realised this was an option. Maybe it's just me, maybe it's because I am too used to SQL and too new to Pandas, but I still find that some tasks which are banal in SQL are messy in pandas, and the documentation isn't very clear – Pythonista anonymous Sep 02 '15 at 15:40
  • The [docs](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) are pretty clear and the warning is there to tell you that what you're doing may not work – EdChum Sep 02 '15 at 15:41
  • 2
    I don't find them clear at all. There is no comparison between pandas docs and those of a commercial product like Matlab. E.g. if you look up the docs for Pandas.DataFrame.loc, the explanation of the syntax is very short and brief. It points to selecetion by label, but even there I couldn't find an example like the one you posted above. There is a 'comparison with sql' section, but it doesn't have your example, either, and the 'update' subsection is empty. – Pythonista anonymous Sep 02 '15 at 15:56

1 Answers1

7

You should use loc to ensure you're working on a view, on your example the following will work and not raise a warning:

df.loc[df['x'] == 10, 'value'] = 1000

So the general form is:

df.loc[<mask or index label values>, <optional column>] = < new scalar value or array like>

The docs highlights the errors and there is the intro, granted some of the function docs are sparse, feel free to submit improvements.

EdChum
  • 376,765
  • 198
  • 813
  • 562