How to get per classification accuracy for a given data set using NaivebayesClassifier

Question

I am very much new to machine learning. I have a problem to solve using supervised machine learning;

Problem: Learn from the training data and understand the labels (I have got training data in .csv formet where column1 is data and column2 is corresponding label, also my data is email of users and label is the category in which I want classify) and later on when given new data to test classify them to one of the label that you have used while training; and we wanted to know the per classification weightage so that we should be confident that the classification done is accurate.

Here is the code i am trying:

import random
import pandas as pd
import nltk

def clean_data(data):
    data = str(data).replace('\n', '').replace('\r', '').replace('\r\n', '').replace('\'', '').replace('\\', '')
    return data


def data_features(word):
    return {'test_data': word}


def clean_data_feature(word):
    return data_features(clean_data(word))


def classifydata(filename, datacolumn, labelcolumn):
    df = pd.read_csv(filename, encoding='latin1', index_col=None, dtype={datacolumn: str})
    subset = df[[datacolumn, labelcolumn]]
    labeled_names = [tuple(x) for x in subset.values]
    random.shuffle(labeled_names)
    featuresets = [(clean_data_feature(n), label) for (n, label) in labeled_names]
    train_set, test_set = featuresets, featuresets
    classifier = nltk.NaiveBayesClassifier.train(train_set)
    df = pd.read_csv('D:/ML/Event_Data_601-700.csv', encoding='latin1', index_col=None, dtype={'mMsgContent': str})
    for data in df['mMsgContent']:
        print(classifier.classify(clean_data_feature(data)))

classifydata('D:/ML/Event_Data_Training_600.csv', 'mMsgContent', 'call related to')

This prints the classification done based on learning, but we wanted to know that "How confident the classifier is (in terms of %) that the classification i did here per record is accurate by some percent.

Any help/suggestion/changing then way of writing this code appreciated; also let me know if i should add more details.

Take a look at https://stackoverflow.com/questions/21107075/classification-using-movie-review-corpus-in-nltk-python — alvas, Nov 30 '17 at 16:48

score 0 · Answer 1 · answered Nov 30 '17 at 06:40

0

You want to measure the accuracy of your model. Do this:

print(nltk.classify.accuracy(classifier, test_set))

This gives a score in-between 0 and 1.(1 means 100%)

Please check this for more details.

answered Nov 30 '17 at 06:40

janu777

1,940
11
26

How to get per classification accuracy for a given data set using NaivebayesClassifier

1 Answers1