-1

I'm getting different results for both - I'm wondering how both the below commands are different. Can someone please explain ?

data['Sales_dummy'][data['sales']>=data['sales'].median()]=1

OR

data[data['sales']>=data['sales'].median()]['Sales_dummy']=1

Sajan
  • 1,247
  • 1
  • 5
  • 13
Divya R
  • 15
  • 2
  • You're asking about the difference between `data['Sales_dummy'][...] = 1` and `data[...]['Sales_dummy'] = 1`? The pseudo-boolean expression used as one of the indices isn't really relevant. – chepner Mar 13 '20 at 19:54
  • The difference is in the memory layout and in the first one you are probably getting a view instead of a copy so the assignment works. This is chained indexing and is the main reason behind the infamous [SettingWithCopyWarning](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). Use `loc`: `data.loc[data['sales'] >= data['sales'].median(), 'Sales_dummy'] = 1` – ayhan Mar 13 '20 at 20:00

1 Answers1

-1

data is indexed with two different values in the two assignments; the first with 'Sales_dummy', the second with the expression [data['sales']>=data['sales'].median()].

In general, indexing of nested data structures is not commutative: d[x][y] and d[y][x] do not have the same value.

chepner
  • 497,756
  • 71
  • 530
  • 681