How to read a specific column of csv file using python

Question

I am new at Scikit-Learn and I want to convert a collection of data which I have already labelled into a dataset. I have converted the .csv file of the data into a NumPy array, however one problem I have run into is to classify the data into training set based on the presence of a flag in the second column. I want to know how to access a particular row, column of a .csv file using the Pandas Utility Module. The following is my code:

    import numpy as np
    import pandas as pd
    import csv
    import nltk
    import pickle
    from nltk.classify.scikitlearn import SklearnClassifier
    from sklearn.naive_bayes import MultinomialNB,BernoulliNB
    from nltk.classify import ClassifierI
    from statistics import mode




    def numpyfy(fileid):
         data = pd.read_csv(fileid,encoding = 'latin1')
         #pd.readline(data)
         target = data["String"]
         data1 = data.ix[1:,:-1]
         #print(data)
         return data1
    def learn(fileid):
         trainingsetpos = []
         trainingsetneg = []
         datanew = numpyfy(fileid)
         if(datanew.ix['Status']==1):
            trainingsetpos.append(datanew.ix['String'])
         if(datanew.ix['Status']==0):
            trainingsetneg.append(datanew.ix['String'])

    print(list(trainingsetpos))

this might help http://stackoverflow.com/a/25902357/1664675 – letsc Aug 03 '15 at 18:24 — letsc, Aug 03 '15 at 18:24

score 0 · Answer 1 · answered Aug 03 '15 at 18:39

You can use boolean indexing to split the data. Something like

import pandas as pd


def numpyfy(fileid):
    df = pd.read_csv(fileid, encoding='latin1')
    target = df.pop('String')
    data = df.ix[1:,:-1]
    return target, data


def learn(fileid):
    target, data = numpyfy(fileid)
    trainingsetpos = data[data['Status'] == 1]
    trainingsetneg = data[data['Status'] == 0]

    print(trainingsetpos)

How to read a specific column of csv file using python

1 Answers1