Selecting multiple dataframe columns by position in pandas

Question

I have a (large) dataframe. How can I select specific columns by position? e.g. columns 1..3, 5, 6

Rather than just drop column4, I am trying to do it in this way because there are a ton of rows in my dataset and I want to select by position:

 df=df[df.columns[0:2,4:5]]

but that gives IndexError: too many indices for array

DF input

 Col1     Col2     Col3       Col4        Col5       Col6
 1        apple    tomato     pear        banana     banana
 1        apple    grape      nan         banana     banana
 1        apple    nan        banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        avacado  tomato     banana      banana     banana
 1        toast    tomato     banana      banana     banana
 1        grape    tomato     egg         banana     banana

DF output - desired

 Col1     Col2     Col3       Col5       Col6
 1        apple    tomato     banana     banana
 1        apple    grape      banana     banana
 1        apple    nan        banana     banana
 1        apple    tomato     banana     banana
 1        apple    tomato     banana     banana
 1        apple    tomato     banana     banana     
 1        avacado  tomato     banana     banana     
 1        toast    tomato     banana     banana     
 1        grape    tomato     banana     banana

@djk47463: that question uses a list of names, this one uses a list of indices. I retitled both questions to make the slight difference clear. — smci, Jan 31 '18 at 21:26
@smci The accepted answer in the linked question uses positions, not labels — Paul H, Jan 31 '18 at 21:41
@smci, that’s not really true, the question was broken into two parts, 1 is by column indexes, the second by label. Most of the answers here are highly correlated with the answers there... — DJK, Jan 31 '18 at 22:06
@djk47463: well then the question's confused, we could delete the first half without losing anything. What should we do? Let me try to edit it for clarity. — smci, Jan 31 '18 at 22:14

BENY · Answer 1 · 2018-01-31T15:14:29.027

17

What you need is numpy np.r_

df.iloc[:,np.r_[0:2,4:5]]
Out[265]: 
   Col1     Col2    Col5
0     1    apple  banana
1     1    apple  banana
2     1    apple  banana
3     1    apple  banana
4     1    apple  banana
5     1    apple  banana
6     1  avacado  banana
7     1    toast  banana
8     1    grape  banana

edited Jan 31 '18 at 15:14

answered Jan 31 '18 at 15:11

BENY

317,841
20
164
234

1

learn a new trick :P – Tai Jan 31 '18 at 15:13

score 6 · Answer 2 · answered Jan 31 '18 at 14:58

You can select columns 0, 1, 4 in this way:

df.iloc[:, [0, 1, 4]]

You can read more about this in Indexing and Selecting Data.

• iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:

◦ An integer e.g. 5

◦ A list or array of integers [4, 3, 0]

◦ A slice object with ints 1:7

◦ A boolean array

◦ A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

a few ways: for columns 1 to 5, you can use `1:6`, for 1 to 5 and 7 to 10, you can create a list: `list(range(1, 6)) + list(range(7, 11))`, etc. — jpp, Jan 31 '18 at 15:07

score 3 · Answer 3 · answered Jan 31 '18 at 15:41

You can also use range with concatenate from numpy and get columns where np.concatenate is used to combine two different ranges:

import numpy as np
df = df[df.columns[np.concatenate([range(0,3),range(4,6)])]]
df

Output:

   Col1     Col2    Col3    Col5    Col6
0     1    apple  tomato  banana  banana
1     1    apple   grape  banana  banana
2     1    apple     NaN  banana  banana
3     1    apple  tomato  banana  banana
4     1    apple  tomato  banana  banana
5     1    apple  tomato  banana  banana
6     1  avacado  tomato  banana  banana
7     1    toast  tomato  banana  banana
8     1    grape  tomato  banana  banana

score 2 · Answer 4 · answered Jan 31 '18 at 15:01

2

Use the pandas iloc method:

df_filtered = df.iloc[:, [1,2,3,5,6]]

answered Jan 31 '18 at 15:01

YanSym

51
5

Tai · Answer 5 · 2018-01-31T15:38:39.213

2

The error OP face is from df.columns[0:2,4:5] where too many indices were put into. IIUC, you can put all the column names you need together to do a selection.

from itertools import chain
cols_to_select = list(v for v in chain(df.columns[0:2], df.columns[4:5]))
df_filtered = df[cols_to_select]

If there can be name conflicts in cols_to_select, do selection using iloc as jp_data_analysis suggested or np.r_ as Wen suggested.

edited Jan 31 '18 at 15:38

answered Jan 31 '18 at 15:10

Tai

7,684
3
29
49

Aha, this is good :-) – BENY Jan 31 '18 at 15:12

Selecting multiple dataframe columns by position in pandas

5 Answers5

Linked

Related