Get column index from column name in python pandas

Question

In R when you need to retrieve a column index based on the name of the column you could do

idx <- which(names(my_data)==my_colum_name)

Is there a way to do the same with pandas dataframes?

score 631 · Accepted Answer · answered Oct 23 '12 at 00:06

631

Sure, you can use .get_loc():

In [45]: df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

In [46]: df.columns
Out[46]: Index([apple, orange, pear], dtype=object)

In [47]: df.columns.get_loc("pear")
Out[47]: 2

although to be honest I don't often need this myself. Usually access by name does what I want it to (df["pear"], df[["apple", "orange"]], or maybe df.columns.isin(["orange", "pear"])), although I can definitely see cases where you'd want the index number.

answered Oct 23 '12 at 00:06

DSM

342,061
65
592
494

16

Column number is useful when using `.iloc` operator, where you must pass only integers for both rows and columns. – abe Sep 08 '17 at 18:52
11

Or when using libraries which want the DF converted to a numpy array and indices of columns with particular features. For example CatBoost wants a list of indices of categorical features. – Tom Walker Oct 13 '17 at 05:35
4

Is there a way to get list of indexes? – haneulkim Sep 03 '21 at 02:55
In my case I want to use the index of the column get values of "itertuples" by column name. Fetching the indeces of the column names instead of hardcoding keeps it dynamic in case of changes to the DF. – cyclux Mar 10 '22 at 11:00
1

it's also useful for the insert function (to enable you to insert after a given column) – seanv507 Nov 25 '22 at 16:43
This works good for superficially figuring out the column number in an excel file based on a df column header, after writing that df with xlsxwriter. – MrChadMWood Aug 21 '23 at 20:32

score 74 · Answer 2 · edited Jan 22 '19 at 02:52

74

Here is a solution through list comprehension. cols is the list of columns to get index for:

[df.columns.get_loc(c) for c in cols if c in df]

edited Jan 22 '19 at 02:52

cs95

379,657
97
704
746

answered Sep 09 '17 at 08:20

snovik

1,027
10
14

score 20 · Answer 3 · edited Nov 18 '22 at 23:06

For returning multiple column indices, I recommend using the pandas.Index method get_indexer, if you have unique labels:

df = pd.DataFrame({"pear": [1, 2, 3], "apple": [2, 3, 4], "orange": [3, 4, 5]})
df.columns.get_indexer(['pear', 'apple'])
# Out: array([0, 1], dtype=int64)

If you have non-unique labels in the index (columns only support unique labels) get_indexer_for. It takes the same args as get_indexer:

df = pd.DataFrame(
    {"pear": [1, 2, 3], "apple": [2, 3, 4], "orange": [3, 4, 5]}, 
    index=[0, 1, 1])
df.index.get_indexer_for([0, 1])
# Out: array([0, 1, 2], dtype=int64)

Both methods also support non-exact indexing with, f.i. for float values taking the nearest value with a tolerance. If two indices have the same distance to the specified label or are duplicates, the index with the larger index value is selected:

df = pd.DataFrame(
    {"pear": [1, 2, 3], "apple": [2, 3, 4], "orange": [3, 4, 5]},
    index=[0, .9, 1.1])
df.index.get_indexer([0, 1])
# array([ 0, -1], dtype=int64)

score 18 · Answer 4 · answered Oct 23 '12 at 18:27

18

DSM's solution works, but if you wanted a direct equivalent to which you could do (df.columns == name).nonzero()

answered Oct 23 '12 at 18:27

Wes McKinney

101,437
32
142
108

Divakar · Answer 5 · 2017-09-04T05:40:03.060

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

def column_index(df, query_cols):
    cols = df.columns.values
    sidx = np.argsort(cols)
    return sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [162]: df
Out[162]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [163]: column_index(df, ['peach', 'banana', 'apple'])
Out[163]: array([4, 1, 0])

mallet · Answer 6 · 2023-01-11T12:36:29.647

13

Update: "Deprecated since version 0.25.0: Use np.asarray(..) or DataFrame.values() instead." pandas docs

In case you want the column name from the column location (the other way around to the OP question), you can use:

>>> df.columns.values()[location]

Using @DSM Example:

>>> df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

>>> df.columns

Index(['apple', 'orange', 'pear'], dtype='object')

>>> df.columns.values()[1]

'orange'

Other ways:

df.iloc[:,1].name

df.columns[location] #(thanks to @roobie-nuby for pointing that out in comments.)

edited Jan 11 '23 at 12:36

answered Mar 02 '18 at 11:35

mallet

2,454
3
37
64

1

`df.columns` is a useful way to access a specific col of a csv – Sheldon Jul 12 '22 at 13:09
`AttributeError: 'Index' object has no attribute 'get_values'` (python 3.8.8 pandas 1.3.1) – Sanjay Manohar Jan 09 '23 at 00:36
@SanjayManohar get_values deprecated since pandas version 0.25.0: Use np.asarray(..) or DataFrame.values() instead. (answer updated) – mallet Jan 11 '23 at 12:36

score 9 · Answer 7 · answered Dec 23 '20 at 16:34

To modify DSM's answer a bit, get_loc has some weird properties depending on the type of index in the current version of Pandas (1.1.5) so depending on your Index type you might get back an index, a mask, or a slice. This is somewhat frustrating for me because I don't want to modify the entire columns just to extract one variable's index. Much simpler is to avoid the function altogether:

list(df.columns).index('pear')

Very straightforward and probably fairly quick.

score 5 · Answer 8 · answered Jan 04 '22 at 22:16

5

When the column might or might not exist, then the following (variant from above works.

ix = 'none'
try:
     ix = list(df.columns).index('Col_X')
except ValueError as e:
     ix = None  
     pass

if ix is None:
   # do something

answered Jan 04 '22 at 22:16

QuentinJS

162
1
9

score 4 · Answer 9 · answered Nov 15 '19 at 07:01

4

how about this:

df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})
out = np.argwhere(df.columns.isin(['apple', 'orange'])).ravel()
print(out)
[1 2]

answered Nov 15 '19 at 07:01

Siraj S.

3,481
3
34
48

Shawn Seamons · Answer 10 · 2021-08-20T23:37:24.660

import random
def char_range(c1, c2):                      # question 7001144
    for c in range(ord(c1), ord(c2)+1):
        yield chr(c)      
df = pd.DataFrame()
for c in char_range('a', 'z'):               
    df[f'{c}'] = random.sample(range(10), 3) # Random Data
rearranged = random.sample(range(26), 26)    # Random Order
df = df.iloc[:, rearranged]
print(df.iloc[:,:15])                        # 15 Col View         

for col in df.columns:             # List of indices and columns
    print(str(df.columns.get_loc(col)) + '\t' + col)

![Results](Results

Get column index from column name in python pandas

10 Answers10

Linked

Related