I am new at Scikit-Learn and I want to convert a collection of data which I have already labelled into a dataset. I have converted the .csv file of the data into a NumPy array, however one problem I have run into is to classify the data into training set based on the presence of a flag in the second column. I want to know how to access a particular row, column of a .csv file using the Pandas Utility Module. The following is my code:
import numpy as np
import pandas as pd
import csv
import nltk
import pickle
from nltk.classify.scikitlearn import SklearnClassifier
from sklearn.naive_bayes import MultinomialNB,BernoulliNB
from nltk.classify import ClassifierI
from statistics import mode
def numpyfy(fileid):
data = pd.read_csv(fileid,encoding = 'latin1')
#pd.readline(data)
target = data["String"]
data1 = data.ix[1:,:-1]
#print(data)
return data1
def learn(fileid):
trainingsetpos = []
trainingsetneg = []
datanew = numpyfy(fileid)
if(datanew.ix['Status']==1):
trainingsetpos.append(datanew.ix['String'])
if(datanew.ix['Status']==0):
trainingsetneg.append(datanew.ix['String'])
print(list(trainingsetpos))