25

So .loc and .iloc are not your typical functions. They somehow use [ and ] to surround the arguments so that it is comparable to normal array indexing. However, I have never seen this in another library (that I can think of, maybe numpy as something like this that I'm blanking on), and I have no idea how it technically works/is defined in the python code.

Are the brackets in this case just syntactic sugar for a function call? If so, how then would one make an arbitrary function use brackets instead of parenthesis? Otherwise, what is special about their use/defintion Pandas?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Conner Phillips
  • 403
  • 4
  • 7
  • 4
    The square brackets are syntactic sugar for the special method `__getitem__`. All objects can implement this method in their class definition and then subsequently work with the square brackets. – Ted Petrou Sep 12 '17 at 12:44
  • 1
    Have a look at the [Pandas documentation on indexing and selection](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-and-selecting-data). – Scott Boston Sep 12 '17 at 12:45
  • 1
    You can have a look at the source code [here](https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexing.py) which is all written in Python. The `__getitem__` method is defined in `_LocationIndexer`. – roganjosh Sep 12 '17 at 13:00
  • 12
    That question linked is not at all a duplicate of what is being asked. I'm sure there is a good answer somewhere but that isn't it @coldspeed – Ted Petrou Sep 12 '17 at 14:03
  • 1
    Why the duck this question marked as duplicate. Doomb af @rayreng – Poojan Jul 29 '19 at 15:26
  • along the lines of syntactic sugar, but perhaps further down the line of thinking would be to make it look like R, or as R, to make it look like the access to arrays and maps/hashes in many other languages. It does add a kink to the path in finding what is called and how it works. – Chris Oct 18 '19 at 23:04

2 Answers2

13

Note: The first part of this answer is a direct adaptation of my answer to this other question, that was answered before this question was reopened. I expand on the "why" in the second part.

So .loc and .iloc are not your typical functions

Indeed, they are not functions at all. I'll make examples with loc, iloc is analogous (it uses different internal classes). The simplest way to check what loc actually is, is:

import pandas as pd
df = pd.DataFrame()
print(df.loc.__class__)

which prints

<class 'pandas.core.indexing._LocIndexer'>

this tells us that df.loc is an instance of a _LocIndexer class. The syntax loc[] derives from the fact that _LocIndexer defines __getitem__ and __setitem__*, which are the methods python calls whenever you use the square brackets syntax.

So yes, brackets are, technically, syntactic sugar for some function call, just not the function you thought it was (there are of course many reasons why python is designed this way, I won't go in the details here because 1) I am not sufficiently expert to provide an exhaustive answer and 2) there are a lot of better resources on the web about this topic).

*Technically, it's its base class _LocationIndexer that defines those methods, I'm simplifying a bit here


Why does Pandas use square brackets with .loc and .iloc?

I'm entering speculation area here, because I couldn't find any document explicitly talking about design choices in Pandas, however: there are at least two good reasons I see for choosing the square brackets.

The first, and most important reason is: you simply can't do with a function call everything you do with the square-bracket notation, because assigning to a function call is a syntax error in python:

# contrived example to show this can't work
a = []
def f():
  global a
  return a
f().append(1) # OK
f() = dict() # SyntaxError: cannot assign to function call

Using round brackets for a "function" call, calls the underlying __call__ method (note that any class that defines __call__ is callable, so "function" call is an incorrect term because python doesn't care whether something is a function or just behaves like one).

Using square brackets, instead, alternatively calls __getitem__ or __setitem__ depending on when the call happens (__setitem__ if it's on the left of an assignment operator, __getitem__ in any other case). There is no way to mimic this behaviour with a function call, you'd need a setter method to modify the data in the dataframe, but it still wouldn't be allowed in an assignment operation:

# imaginary method-based alternative to the square bracket notation:
my_data = df.get_loc(my_index)
df.set_loc(my_index, my_data*2)

This example brings me to the second reason: consistency. You can access elements of a DataFrame via square brackets:

something = df['a']
df['b'] = 2*something

when using loc you're still trying to refer to some items in the DataFrame, so it's more consistent to use the same syntax instead of asking the user to use some getter and setter functions (it's also, I believe, "more pythonic", but that's a fuzzy concept I'd rather stay away from).

GPhilo
  • 18,519
  • 9
  • 63
  • 89
  • 1
    Regarding the "why" part, using `:` for slices might be another reason. – user202729 Feb 04 '21 at 13:43
  • Good point! You could theoretically obtain the same effect in functions by explicitly passing `slice()` instances, but that's horrible – GPhilo Feb 04 '21 at 13:47
2

Underneath the covers, both are using the __setitem__ and __getitem__ functions.

Batman
  • 8,571
  • 7
  • 41
  • 80
  • 3
    It doesn't answer the question. What can you do by routing to __getitem__ that you couldn't do directly without it...*besides* the square brackets? – Chris Oct 18 '19 at 15:47