0

Suppose that you have a pandas DataFrame named df with columns ['a','b','c','d','e'] and you want to create a new DataFrame newdf with columns 'b' and 'd'. There are two possible ways to do this:

newdf = df[['b','d']]

or

newdf = df.loc[:,['b','d']]

The first is using the indexing operator. The second is using .loc. Is there a reason to prefer one over the other?

Irv
  • 540
  • 4
  • 13
  • https://stackoverflow.com/questions/38886080/python-pandas-series-why-use-loc – BENY Mar 25 '19 at 20:42
  • Depending on what you do with the slices once you obtain them, you might run into a [SettingWithCopyWarning](https://stackoverflow.com/a/53954986/4909087) which might prevent you from being able to make updates depending on whether you're dealing with a view or copy. – cs95 Mar 25 '19 at 20:43
  • @coldspeed I think both operations produce a copy of the data, so you won't get that warning with operations on `newdf`. (Note - I modified the code a little to be able to reference `newdf`) – Irv Mar 25 '19 at 21:07
  • `v = df[['b']]; v['b'] = 3` throws `SettingWithCopyWarning`. The warning is raised on chained assignments, regardless of whether you have a view or copy. You don't have this issue with `loc`. That's my point. – cs95 Mar 25 '19 at 21:09

1 Answers1

0

Thanks to @coldspeed, it seems that newdf = df.loc[:,['b','d']] is preferred to avoid the dreaded SettingWithCopyWarning.

Irv
  • 540
  • 4
  • 13