Extracting specific columns from pandas.dataframe

Question

I'm trying to use python to read my csv file extract specific columns to a pandas.dataframe and show that dataframe. However, I don't see the data frame, I receive Series([], dtype: object) as an output. Below is the code that I'm working with: My document consists of: product sub_product issue sub_issue consumer_complaint_narrative
company_public_response company state zipcode tags
consumer_consent_provided submitted_via date_sent_to_company
company_response_to_consumer timely_response consumer_disputed?
complaint_id

I want to extract : sub_product issue sub_issue consumer_complaint_narrative

import pandas as pd

df=pd.read_csv("C:\\....\\consumer_complaints.csv")
df=df.stack(level=0)
df2 = df.filter(regex='[B-F]')
df[df2]

When you extract only one column that automatically becomes a `series`, do you want to forcefully make it a dataframe? — amrrs, Feb 06 '18 at 11:12
You can simple use it like this: `df2 = df[['b','c','d','e','f']]` why are you using regex? — amrrs, Feb 06 '18 at 11:18
I think you need `df=pd.read_csv("C:\\....\\consumer_complaints.csv")` and `print (df.loc[:, 'B':'F'])`. — jezrael, Feb 06 '18 at 11:22
but the best is add some sample data to answer, please check [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — jezrael, Feb 06 '18 at 11:23
@jezrael I've already tried that. It doesn't work. it gives an hashtable error — Yags, Feb 06 '18 at 11:25

kepy97 · Accepted Answer · 2018-02-06T11:48:55.770

58

import pandas as pd

input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]

Here specify your column numbers which you want to select. In dataframe, column start from index = 0

cols = []

You can select column by name wise also. Just use following line

df = df[["Column Name","Column Name2"]]

edited Feb 06 '18 at 11:48

answered Feb 06 '18 at 11:25

kepy97

988
10
12

thank you for your help. However, I still receive the hashtable error. – Yags Feb 06 '18 at 11:37
df=df["product", "sub_product", "issue", "sub_issue", "consumer_complaint_narrative", "complaint_id"] Traceback (most recent call last): File "", line 1, in df=df["product", "sub_product", "issue", "sub_issue", "consumer_complaint_narrative", "complaint_id"] KeyError: ('product', 'sub_product', 'issue', 'sub_issue', 'consumer_complaint_narrative', 'complaint_id') – Yags Feb 06 '18 at 11:46
I know it's reading the whole file and creating dataframe. The dataframe exists. It's just that I can't choose specific columns – Yags Feb 06 '18 at 11:47
1

You have to use double square bracket. df=df[["product", "sub_product", "issue", "sub_issue", "consumer_complaint_narrative", "complaint_id"] ] – kepy97 Feb 06 '18 at 11:48
Hope above comment will solve your problem. If any other error occur. Let me know – kepy97 Feb 06 '18 at 11:50
thank you @kepy97 that worked. Why do we have to put two square brackets? – Yags Feb 06 '18 at 11:51
1

In dataframe only one bracket with one column name returns as a series. If more than one column found than it raise "Key error". So for multiple column it takes input as array. Please [Refer](http://pandas.pydata.org/pandas-docs/stable/indexing.html) this document for more details. – kepy97 Feb 06 '18 at 12:03

score 5 · Answer 2 · answered Feb 06 '18 at 11:23

5

A simple way to achieve this would be as follows:

df = pd.read_csv("C:\\....\\consumer_complaints.csv")
df2 = df.loc[:,'B':'F']

Hope that helps.

answered Feb 06 '18 at 11:23

PaW

659
4
7

sorry I've tried this multiple times it doesn't work. It gives hashtable error – Yags Feb 06 '18 at 11:37
In that case the problem may be in the data. Look at the contents of the csv file. – PaW Feb 06 '18 at 11:48

MSIS · Answer 3 · 2021-12-15T23:36:20.943

This worked for me, using slicing:

df=pd.read_csv

df1=df[n1:n2]

Where $n1<n2# are both columns in the range, e.g: if you want columns 3-5, use

df1=df[3:5]

For the first column, use

df1=df[0]

Though not sure how to select a discontinuous range of columns.

We can also use i.loc. Given data in dataset2:

dataset2.iloc[:3,[1,2]]

Will spit out the top 3 rows of columns 2-3 (Remember numbering starts at 0)

Then dataset2.iloc[:3,[1,2]] spits out

Extracting specific columns from pandas.dataframe

3 Answers3