2

Sorry if this is a simple question, I've tried to look for a solution but can't find anything.

My code goes like this:

  • given zip1, create an index to select observations (other zipcodes) where some calculation has not been done yet (666)

    I = (df['zip1'] == zip1) & (df['Distances'] == 666)
    
  • perform some calculation

    distances = calc(zip1,df['zip2'][I])
    

So far so good, I've checked the distances variable, correct values, correct sized array.

  • put the distance variable in the right place

    df['Distances'][I] = distances
    

but this last part updates all the df['Distances'] variables to nonsense values FOR ALL observations with df['zip1']=zip1 instead of the ones selected by I.

I've checked the boolean array I before the df['Distances'][I] = distances command and it looks fine. Any ideas would be greatly appreciated.

RMcG
  • 1,045
  • 7
  • 14
AsianYayaToure
  • 153
  • 1
  • 2
  • 11
  • You need to use `.loc` or `.ix` rather than chained assignment see [this](http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy) also [related](http://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining) so the correct usage is `df.loc[l,'Distances']=distances` – EdChum Oct 30 '13 at 16:56
  • do you have workging example to test? – Roman Pekar Oct 30 '13 at 18:08
  • The suggestion by EdChum worked. Makes sense, never knew about this view vs copy stuff before. Thanks. – AsianYayaToure Oct 30 '13 at 20:40

1 Answers1

0

What you are attempting is called chained assignment and does not work the way you think as it returns a copy rather than a view hence the error you see.

There is more information about it here and related issues, this and this.

So you should either use .loc or .ix like so:

df.loc[I,'Distances']=distances
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562