2

Let's take this data frame below as an example:

df = pd.DataFrame({
    'a':[1,2,3,4],
    'b':[2,4,6,8],
    'c':[True,True,False,False]
    })

>df
   a  b      c
0  1  2   True
1  2  4   True
2  3  6  False
3  4  8  False

I have different ways to select column a where column c equal to True:

First way:

df.loc[df.c == True, 'a']

Second way:

df.loc[df['c'] == True, 'a']

Third way:

df.a[df['c'] == True]

All those get the same result:

0    1
1    2
Name: a, dtype: int64

And there are other operations like df.a[df.c == True] can did it. I just wondering is there any difference between indexing operations (.loc) ([ ]) and (.).

freefrog
  • 685
  • 1
  • 8
  • 15
  • 1
    https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation – cs95 Jul 27 '17 at 13:53
  • If you have a column named `loc`, then would you use `df['loc']`, the column or `df.loc`,the function? – OneCricketeer Jul 27 '17 at 13:54
  • Or sometime your Columns' name contain special mark like`.`, space or `_`, just do, `df['name']` – BENY Jul 27 '17 at 13:57
  • 1
    Possible duplicate of [In a Pandas DataFrame, what's the difference between using squared brackets or dot to 'cal a column?](https://stackoverflow.com/questions/41130255/in-a-pandas-dataframe-whats-the-difference-between-using-squared-brackets-or-d) – OneCricketeer Jul 27 '17 at 13:57
  • Possible duplicate of [pandas iloc vs ix vs loc explanation?](https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation) – Kacper Wolkowski Jul 28 '17 at 13:45

1 Answers1

1

There is no difference in pandas between .a and ["a"] however (@cricket_007 link), as answered here: In a Pandas DataFrame, what's the difference between using squared brackets or dot to 'cal a column?


However

When u use [] you are passing a list of True and False values

[df.c]

prints:

[0     True
 1     True
 2    False
 3    False
 Name: c, dtype: bool]

And:

type([df.c]) #prints 'list'

these are the same in other words.

df[df.c] 
df[[True,True,False,False]]

This is not equal to .loc, a dataframe function, that seem to be the fastest considering your sample

%timeit df[df.c].a
1000 loops, best of 3: 437 µs per loop

%timeit df.a[df.c]
1000 loops, best of 3: 387 µs per loop

%timeit df.loc[df.c, 'a'] #equal to df.loc[df["c"], "a"]
1000 loops, best of 3: 210 µs per loop
Anton vBR
  • 18,287
  • 5
  • 40
  • 46