Select specific CSV columns (Filtering) - Python/pandas

Question

I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.

Let's suppose that we have a CSV file.

in  value   d     f
0    975   f01    5
1    976   F      4
2    977   d4     1
3    978   B6     0
4    979   2C     0

I want to select a specific columns.

import pandas
data = pandas.read_csv("ThisFile.csv")

In order to select the first 2 columns I used

data.ix[:,:2]

In order to select different columns like the 2nd and the 4th. What should I do?

There is another way to solve this problem by re-writing the CSV file. But it's huge file; So I am avoiding this way.

can't you do say, `data.value` and `data.f`? is that what you're asking for? — Higgs, Mar 14 '14 at 01:49

unutbu · Answer 1 · 2014-03-14T01:55:50.087

21

This selects the second and fourth columns (since Python uses 0-based indexing):

In [272]: df.iloc[:,(1,3)]
Out[272]: 
   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

[5 rows x 2 columns]

df.ix can select by location or label. df.iloc always selects by location. When indexing by location use df.iloc to signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.

Another possibility is to use the usecols parameter:

data = pandas.read_csv("ThisFile.csv", usecols=[1,3])

This will load only the second and fourth columns into the data DataFrame.

edited Mar 14 '14 at 01:55

answered Mar 14 '14 at 01:50

unutbu

842,883
184
1,785
1,677

Thanks ! One last thing, I got problem while trynig iloc, I got this problem. "IndexError: too many indices" – user3378649 Mar 14 '14 at 02:11
You might have gotten that error, "Too many *indexers*", if the parentheses were omitted, as in `df.iloc[:,1,3]`. – unutbu Mar 14 '14 at 09:12

score 10 · Answer 2 · answered Mar 14 '14 at 02:48

10

If you rather select column by name, you can use

data[['value','f']]

   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

answered Mar 14 '14 at 02:48

Wai Yip Tung

18,106
10
43
47

score 1 · Answer 3 · answered Jun 15 '19 at 16:38

1

As Wai Yip Tung said, you can filter your dataframe while reading by specifying the name of the columns, for example:

import pandas as pd
data = pd.read_csv("ThisFile.csv")[['value','d']]

This solved my problem.

answered Jun 15 '19 at 16:38

dasilvadaniel

413
4
8

Select specific CSV columns (Filtering) - Python/pandas

3 Answers3

Linked