In pandas, how to set_index with using column index instead of referring to column names?

Question

For example:

We have a Pandas dataFrame foo with 2 columns ['A', 'B'].

I want to do function like foo.set_index([0,1]) instead of foo.set_index(['A', 'B'])

Have tried foo.set_index([[0,.1]]) as well but came with this error:

Length mismatch: Expected axis has 9 elements, new values have 2 elements

unutbu · Accepted Answer · 2016-06-29T12:09:47.563

14

If the column index is unique you could use:

df.set_index(list(df.columns[cols]))

where cols is a list of ordinal indices.

For example,

In [77]: np.random.seed(2016)

In [79]: df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('ABCD'))

In [80]: df
Out[80]: 
   A  B  C  D
0  3  7  2  3
1  8  4  8  7
2  9  2  6  3
3  4  1  9  1
4  2  2  8  9

In [81]: df.set_index(list(df.columns[[0,2]]))
Out[81]: 
     B  D
A C      
3 2  7  3
8 8  4  7
9 6  2  3
4 9  1  1
2 8  2  9

If the DataFrame's column index is not unique, then setting the index by label is impossible and by ordinals more complicated:

import numpy as np
import pandas as pd
np.random.seed(2016)

def set_ordinal_index(df, cols):
    columns, df.columns = df.columns, np.arange(len(df.columns))
    mask = df.columns.isin(cols)
    df = df.set_index(cols)
    df.columns = columns[~mask]
    df.index.names = columns[mask]
    return df

df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('AAAA'))
print(set_ordinal_index(df, [0,2]))

yields

edited Jun 29 '16 at 12:09

answered Jun 28 '16 at 00:14

unutbu

842,883
184
1,785
1,677

What if it's a range of column such as columns 0 through 10. I tried df.columns[[0:10]] and it threw an error. I'm trying to avoid typing every column integer. – Chris Aug 01 '18 at 23:13
1

@Chris: `df.columns` is a [sequence](https://docs.python.org/3/glossary.html#term-sequence) so `df.columns[:10]` selects the first 10 column labels. – unutbu Aug 02 '18 at 01:08
Okay, and for my own knowledge, what about columns 3 through 10? – Chris Aug 02 '18 at 12:06
1

That would be `df.columns[2:10]` selects the 3rd through 10th columns. Since [Python uses 0-based indexing](https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi), index 2 indicates the 3rd item in the sequence. See [Understanding Python's slice notation](https://stackoverflow.com/q/509211/190597). – unutbu Aug 02 '18 at 12:09
1

A helpful thing to know about Python slicing notation is that `seq[start:end]` will return `end-start` items from `seq` starting with the item at index `start` (or fewer items if `seq` does not contain that many items). So `df.columns[2:10]` will return 8 items starting with the 3rd column label: 3rd, 4th, 5th, 6th, 7th, 8th, 9th, 10th is 8 items. – unutbu Aug 02 '18 at 12:19

score 2 · Answer 2 · answered Oct 25 '20 at 02:15

2

This worked for me, the other answer didn't.

# single column
df.set_index(df.columns[1])
# multi column
df.set_index(df.columns[[1, 0]].tolist())

answered Oct 25 '20 at 02:15

citynorman

4,918
3
38
39

In pandas, how to set_index with using column index instead of referring to column names?

2 Answers2