Fastest way to access pandas column

Question

I am confused by the difference in performance between the various ways to access a pandas column.

In [1]: df = pd.DataFrame([[1,1,1],[2,2,2]],columns=['a','b','c'])

In [2]: %timeit df['a']
The slowest run took 75.37 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 3.12 µs per loop

In [3]: %timeit df.a
The slowest run took 5.14 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 6.59 µs per loop

In [4]: %timeit df.loc[:,'a']
10000 loops, best of 3: 55 µs per loop

I understand that the last variant is slower because it enables the values to be set, not just accessed. But why is df.a slower than df['a']? This seems true regardless of the intermediate results being cached.

score 2 · Answer 1 · answered Jul 10 '17 at 06:15

2

Here is a link that explains what is a difference between a . access and [] access.

Also look into the behavior of these operators in the documentation

getitem (for []) and getattr (for .) methods.

. seems to access the column through a function call, thereby taking less time than a [] which is accessed as a dictionary key-value

answered Jul 10 '17 at 06:15

Sriram Sitharaman

867
6
15

3

Thank you. This is helpful. However, in the pandas case the function call `.` takes twice as long as `[]`, contrary to what you suggest. Any ideas? – amball Jul 10 '17 at 16:40

Fastest way to access pandas column

1 Answers1