2

Suppose I have a dataframe df such as:

    item   A    B
0   foo    3    4
1   bar    8    7
2   baz    1    2

I can set the index as follows:

new_df = df.set_index('item')

item   A    B
foo    3    4
bar    8    7
baz    1    2

However assuming 'item' will always be unique (in fact I want it to be unique to avoid errors during analysis), what are the benefits of replacing the default index with a column of my choosing?

Other than ensuring the indexed column contains unique values (which is important in my case), currently I can only see disadvantages of setting an index. For example I can no longer filter this (indexed) column using loc, for example this doesn't work:

filtered_df = new_df.loc[new_df['name'] == 'foo']

I've never set indexes before in pandas. Do they have any benefits I'm missing such as speed benefits or special methods?

Alan
  • 509
  • 4
  • 15

0 Answers0