Preferred pandas code for selecting all rows and a subset of columns

Question

Suppose that you have a pandas DataFrame named df with columns ['a','b','c','d','e'] and you want to create a new DataFrame newdf with columns 'b' and 'd'. There are two possible ways to do this:

newdf = df[['b','d']]

or

newdf = df.loc[:,['b','d']]

The first is using the indexing operator. The second is using .loc. Is there a reason to prefer one over the other?

https://stackoverflow.com/questions/38886080/python-pandas-series-why-use-loc — BENY, Mar 25 '19 at 20:42
Depending on what you do with the slices once you obtain them, you might run into a [SettingWithCopyWarning](https://stackoverflow.com/a/53954986/4909087) which might prevent you from being able to make updates depending on whether you're dealing with a view or copy. — cs95, Mar 25 '19 at 20:43
@coldspeed I think both operations produce a copy of the data, so you won't get that warning with operations on `newdf`. (Note - I modified the code a little to be able to reference `newdf`) — Irv, Mar 25 '19 at 21:07
`v = df[['b']]; v['b'] = 3` throws `SettingWithCopyWarning`. The warning is raised on chained assignments, regardless of whether you have a view or copy. You don't have this issue with `loc`. That's my point. — cs95, Mar 25 '19 at 21:09

score 0 · Answer 1 · answered Mar 25 '19 at 22:13

0

Thanks to @coldspeed, it seems that newdf = df.loc[:,['b','d']] is preferred to avoid the dreaded SettingWithCopyWarning.

answered Mar 25 '19 at 22:13

Irv

540
4
13

Preferred pandas code for selecting all rows and a subset of columns

1 Answers1