I am a bit confused about when one should make a column an index in a df. My understanding was that indices identified unique observations within the df (i.e. id, time for example). However, why have those be an index instead of just columns in the df? It looks as if pretty much all operations in pandas can be done using columns rather than indices (merge
, selection using query
, etc). There must be very specific cases when indices prove beneficial: could someone provide some examples?
Asked
Active
Viewed 94 times
3

Alex
- 1,281
- 1
- 13
- 26
-
The complexity of selection by index value is much lower than if selecting by a boolean array (i.e. filtering by column values). See [this question](https://stackoverflow.com/questions/45240803) for more details – Marat Jun 22 '18 at 02:42
-
even if you sort the df by the column.. i would think it would be exactly the same, no? unless indices are somehow hashed ... are they? – Alex Jun 22 '18 at 02:55
-
1Convenience is one of the reasons. It is easier to write `df.loc['Mary']` than `df[df['name']=='Mary']`. – DYZ Jun 22 '18 at 03:07
-
but it's just as easy to write `df.query("name == 'Mary'")` – Alex Jun 22 '18 at 03:08
-
Tastes differ. Which is why this question is probably off topic on SO. – DYZ Jun 22 '18 at 03:08
-
this isn't a taste thing: for example in R `data.table` there are very real reasons to have set keys on a table (the conceptual equivalent of index in pandas), trying to understand if the same exists in pandas – Alex Jun 22 '18 at 03:10
-
For `Series.interpolate` the index is important if you don't want to treat each row as evenly spaced. – ALollz Jun 22 '18 at 03:58
1 Answers
0
Probably faster look up will be possible if you make an index from a column. Check this out, https://stackoverflow.com/a/27238758/9968677

Muthu
- 21
- 5