when should one add a column as an index to a dataframe in pandas

Question

I am a bit confused about when one should make a column an index in a df. My understanding was that indices identified unique observations within the df (i.e. id, time for example). However, why have those be an index instead of just columns in the df? It looks as if pretty much all operations in pandas can be done using columns rather than indices (merge, selection using query, etc). There must be very specific cases when indices prove beneficial: could someone provide some examples?

The complexity of selection by index value is much lower than if selecting by a boolean array (i.e. filtering by column values). See [this question](https://stackoverflow.com/questions/45240803) for more details — Marat, Jun 22 '18 at 02:42
even if you sort the df by the column.. i would think it would be exactly the same, no? unless indices are somehow hashed ... are they? — Alex, Jun 22 '18 at 02:55
Convenience is one of the reasons. It is easier to write `df.loc['Mary']` than `df[df['name']=='Mary']`. — DYZ, Jun 22 '18 at 03:07
Tastes differ. Which is why this question is probably off topic on SO. — DYZ, Jun 22 '18 at 03:08
this isn't a taste thing: for example in R `data.table` there are very real reasons to have set keys on a table (the conceptual equivalent of index in pandas), trying to understand if the same exists in pandas — Alex, Jun 22 '18 at 03:10
For `Series.interpolate` the index is important if you don't want to treat each row as evenly spaced. — ALollz, Jun 22 '18 at 03:58

score 0 · Answer 1 · answered Jun 22 '18 at 05:12

0

Probably faster look up will be possible if you make an index from a column. Check this out, https://stackoverflow.com/a/27238758/9968677

answered Jun 22 '18 at 05:12

Muthu

21
5

when should one add a column as an index to a dataframe in pandas

1 Answers1