22

I'm trying to use python to read my csv file extract specific columns to a pandas.dataframe and show that dataframe. However, I don't see the data frame, I receive Series([], dtype: object) as an output. Below is the code that I'm working with: My document consists of: product sub_product issue sub_issue consumer_complaint_narrative
company_public_response company state zipcode tags
consumer_consent_provided submitted_via date_sent_to_company
company_response_to_consumer timely_response consumer_disputed?
complaint_id

I want to extract : sub_product issue sub_issue consumer_complaint_narrative

import pandas as pd

df=pd.read_csv("C:\\....\\consumer_complaints.csv")
df=df.stack(level=0)
df2 = df.filter(regex='[B-F]')
df[df2]
Yags
  • 482
  • 1
  • 6
  • 18
  • When you extract only one column that automatically becomes a `series`, do you want to forcefully make it a dataframe? – amrrs Feb 06 '18 at 11:12
  • yes want to make it a dataframe with columns B through F – Yags Feb 06 '18 at 11:16
  • You can simple use it like this: `df2 = df[['b','c','d','e','f']]` why are you using regex? – amrrs Feb 06 '18 at 11:18
  • it still produces Series([], dtype: object) as output – Yags Feb 06 '18 at 11:21
  • I think you need `df=pd.read_csv("C:\\....\\consumer_complaints.csv")` and `print (df.loc[:, 'B':'F'])`. – jezrael Feb 06 '18 at 11:22
  • 1
    but the best is add some sample data to answer, please check [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Feb 06 '18 at 11:23
  • @jezrael I've already tried that. It doesn't work. it gives an hashtable error – Yags Feb 06 '18 at 11:25

3 Answers3

58
import pandas as pd

input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]

Here specify your column numbers which you want to select. In dataframe, column start from index = 0

cols = []

You can select column by name wise also. Just use following line

df = df[["Column Name","Column Name2"]]
kepy97
  • 988
  • 10
  • 12
  • thank you for your help. However, I still receive the hashtable error. – Yags Feb 06 '18 at 11:37
  • df=df["product", "sub_product", "issue", "sub_issue", "consumer_complaint_narrative", "complaint_id"] Traceback (most recent call last): File "", line 1, in df=df["product", "sub_product", "issue", "sub_issue", "consumer_complaint_narrative", "complaint_id"] KeyError: ('product', 'sub_product', 'issue', 'sub_issue', 'consumer_complaint_narrative', 'complaint_id') – Yags Feb 06 '18 at 11:46
  • I know it's reading the whole file and creating dataframe. The dataframe exists. It's just that I can't choose specific columns – Yags Feb 06 '18 at 11:47
  • 1
    You have to use double square bracket. df=df[["product", "sub_product", "issue", "sub_issue", "consumer_complaint_narrative", "complaint_id"] ] – kepy97 Feb 06 '18 at 11:48
  • Hope above comment will solve your problem. If any other error occur. Let me know – kepy97 Feb 06 '18 at 11:50
  • thank you @kepy97 that worked. Why do we have to put two square brackets? – Yags Feb 06 '18 at 11:51
  • 1
    In dataframe only one bracket with one column name returns as a series. If more than one column found than it raise "Key error". So for multiple column it takes input as array. Please [Refer](http://pandas.pydata.org/pandas-docs/stable/indexing.html) this document for more details. – kepy97 Feb 06 '18 at 12:03
5

A simple way to achieve this would be as follows:

df = pd.read_csv("C:\\....\\consumer_complaints.csv")
df2 = df.loc[:,'B':'F']

Hope that helps.

PaW
  • 659
  • 4
  • 7
2

This worked for me, using slicing:

df=pd.read_csv

df1=df[n1:n2]

Where $n1<n2# are both columns in the range, e.g: if you want columns 3-5, use

df1=df[3:5]

For the first column, use

df1=df[0]

Though not sure how to select a discontinuous range of columns.

We can also use i.loc. Given data in dataset2:

dataset2.iloc[:3,[1,2]]

Will spit out the top 3 rows of columns 2-3 (Remember numbering starts at 0)

enter image description here

Then dataset2.iloc[:3,[1,2]] spits out

enter image description here

MSIS
  • 157
  • 1
  • 8