0

I am using some sample code (below) to test a NB classifier and Im getting the following error from line 22:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

This is a sample row of the csv file:

b8:27:eb:38:72:a7,df598b5eb8f4,5/9/16 14:47,154aec250ef6,-84,outside

sample of code:

from sklearn.preprocessing import LabelBinarizer
import numpy as np
from sklearn import naive_bayes
import csv
import random
from sklearn import metrics
import urllib
url = "example.com"
webpage = urllib.urlopen(url)
# download the file
#raw_data = urllib.urlopen(url)

datareader = csv.reader(webpage) #line 22 is this one

ct = 0;
for row in datareader:
  ct = ct+1
webpage = urllib.urlopen(url)
datareader = csv.reader(webpage)
data = np.array(-1*np.ones((ct,6),float),object);
k=0;
for row in datareader:
    data[k,:] = np.array(row)
    k = k+1;

featnames = np.array(['unti','dongle','timestamp','tracker','rssi','label'],str)

keys = [[]]*np.size(data,1)
numdata = -1*np.ones_like(data);

for k in range(np.size(data,1)):
    keys[k],garbage,numdata[:k] = np.unique(data[:,k],True,True)

numrows = np.size(numdata,0);
numcols = np.size(numdata,1);
numdata = np.array(numdata, int)
xdata = numdata[:,:-1]
ydata = numdata[:,-1]

lbin = LabelBinarizer();
for k in range(np.size(xdata,1)):
 if k==0:
   xdata_ml = lbin.fit_transform(xdata[:,k]);
 else:
   xdata_ml = np.hstack((xdata_ml,lbin.fit_transform(xdata[:,k])))
ydata_ml = lbin.fit_transform(ydata)


allIDX = np.arrange(numrows);
random.shuffle(allIDX);
holdout_number = numrows/10;
testIDX = allIDX[0:holdout_number];
trainIDX = allIDX[holdout_number:];

xtest = xdata_ml[testIDX,:];
xtrain = xdata_ml[trainIDX,:];
ytest = ydata[testIDX];
ytrain = ydata[trainIDX];

mnb = naive_bayes.MultinomialNB();
mnb.fit(xtrain,ytrain);
print "Classification accuracy of MNB =", mnb.score(xtest,ytest)

Can anyone help me find the error and suggest a fix?

DataGuy
  • 1,695
  • 4
  • 22
  • 38

2 Answers2

0

Are you using windows? If yes, this can be solved by:

datareader = csv.reader(webpage, dialect=csv.excel_tab)
silviomoreto
  • 5,629
  • 3
  • 30
  • 41
0

Some of the answers here CSV new-line character seen in unquoted field error refer to CSV in MAC

Can you try to manually download the file to your MAC and try to do the following with the file as local file:

1) Save the file as CSV (MS-DOS Comma-Separated)

2) Save the file as CSV (Windows Comma-Separated)

3) Run the following script

with open(csv_filename, 'rU') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print ', '.join(row)

explanation about 'ru': https://www.python.org/dev/peps/pep-0278/

In a Python with universal newline support open() the mode parameter can also be "U", meaning "open for input as a text file with universal newline interpretation". Mode "rU" is also allowed, for symmetry with "rb"

Rationale

Universal newline support is implemented in C, not in Python. This is done because we want files with a foreign newline convention to be import-able, so a Python Lib directory can be shared over a remote file system connection, or between MacPython and Unix-Python on Mac OS X

Community
  • 1
  • 1
Yaron
  • 10,166
  • 9
  • 45
  • 65