1

I am very new to Python and Machine Learning, below is my code in python 3 and I am writting the python code in jupyter nottebook.

import random
def splitDataset(dataset, splitRatio):
trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
    index = random.randrange(len(copy))
    trainSet.append(copy.pop(index))
return [trainSet, testSet]

import csv
import sys
from langdetect import detect
import random
import math


def loadCsv(filename):
lines = csv.reader(open(filename, "r",encoding='latin1'))
x=0
myList=[]
for line in lines:
    t=line[14]
    try:
        b = detect(t)
        if b=="en":
            myList.insert(x,t)
            x=x+1
    except Exception :
        y=0
return myList


import nltk.classify.util
from nltk.classify import NaiveBayesClassifier 

filename = 'F:\\Study\\Text Mining (GIT)\\sources\\Data.csv'
splitRatio = 0.8
loadCsv(filename)
trainingSet, testSet = splitDataset(myList, splitRatio)

classifier = nltk.NaiveBayesClassifier.train(trainingSet)
print (nltk.classify.util.accuracy(classifier, testSet))

classifier.show_most_informative_features()

After Running the abve code I am getting the following error

ValueError                                Traceback (most recent call last)
<ipython-input-206-75c0ffc409d5> in <module>()
 10 print(len(testSet))
 11 
 ---> 12 classifier = nltk.NaiveBayesClassifier.train(trainingSet)
 13 print (nltk.classify.util.accuracy(classifier, testSet))
 14 

 f:\python\lib\site-packages\nltk\classify\naivebayes.py in train(cls, 
 labeled_featuresets, estimator)
 195         # Count up how many times each feature value occurred, given
 196         # the label and featurename.
 --> 197         for featureset, label in labeled_featuresets:
 198             label_freqdist[label] += 1
  199             for fname, fval in featureset.items():

 ValueError: too many values to unpack (expected 2)


 trainingSet=[ "Pleasant 10 min walk along the sea front to the Water Bus. restaurants etc. Hotel was comfortable breakfast was good - quite a variety. Room aircon didn't work very well. Take mosquito repelant!", "Really lovely hotel. Stayed on the very top floor and were surprised by a Jacuzzi bath we didn't know we were getting! Staff were friendly and helpful and the included breakfast was great! Great location and great value for money. Didn't want to leave!", 'We stayed here for four nights in October. The hotel staff were welcoming, friendly and helpful. Assisted in booking tickets for the opera. The rooms were clean and comfortable- good shower, light and airy rooms with windows you could open wide. Beds were comfortable. Plenty of choice for breakfast.Spa at hotel nearby which we used while we were there.', 'We stayed here for four nights in October. The hotel staff were welcoming, friendly and helpful. Assisted in booking tickets for the opera. The rooms were clean and comfortable- good shower, light and airy rooms with windows you could open wide. Beds were comfortable. Plenty of choice for breakfast.Spa at hotel nearby which we used while we were there.',.....]

I have seen the following website for solution but coudn't find any solution : ValueError: too many values to unpack (NLTK classifier)

NLTK ValueError: too many values to unpack (expected 2)

http://www.solutionscan.org/220106-python

ValueError : too many values to unpack (expected 2)

NLTK accuracy: "ValueError: too many values to unpack"

Anas Reza
  • 646
  • 2
  • 9
  • 28
  • Please show a sample of how your `trainingSet` looks like – desertnaut Sep 23 '18 at 17:58
  • @desertnaut you mean should I print the trainingSet? – Anas Reza Sep 23 '18 at 19:10
  • A sample of it, yes, so we can see its structure – desertnaut Sep 23 '18 at 19:54
  • 1
    @desertnaut ['reviews.text', "Pleasant 10 min walk along the sea front to the Water Bus. restaurants etc. Hotel was comfortable breakfast was good - quite a variety. Room aircon didn't work very well. Take mosquito repelant!", "Really lovely hotel. Stayed on the very top floor and were surprised by a Jacuzzi bath we didn't know we were getting! Staff were friendly and helpful and the included breakfast was great! Great location and great value for money. Didn't want to leave!",] – Anas Reza Sep 23 '18 at 20:32

1 Answers1

0

Your input to train() is wrong . It expects input of list of tuples, where first element of tuple should be a dictionary.

def train(cls, labeled_featuresets, estimator=ELEProbDist):
    """
    :param labeled_featuresets: A list of classified featuresets,
        i.e., a list of tuples ``(featureset, label)``.
    """
label_features = []
dic = {}
dic['chipotle']='mexican'
dic['burger']='american'

label_features.append((dic,'food'))

NaiveBayesClassifier.train(label_features)

>><nltk.classify.naivebayes.NaiveBayesClassifier object at 0x000001704916BDD8>

You can refer example at NLTK documentation and print out featureset values to understand the format.

Morse
  • 8,258
  • 7
  • 39
  • 64