Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines svm, logistic regression, naive Bayes, random forest random-forest and artificial neural networks neural-network.

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as regression. The unsupervised counterpart to classification is known as clustering (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions

575

votes

5 answers

A simple explanation of Naive Bayes Classification

I am finding it hard to understand the process of Naive Bayes, and I was wondering if someone could explain it with a simple step by step process in English. I understand it takes comparisons by times occurred as a probability, but I have no idea…

algorithm machine-learning dataset classification naivebayes

asked Apr 08 '12 at 00:56

Aeonitis

5,887
3
14
8

395

votes

6 answers

What are advantages of Artificial Neural Networks over Support Vector Machines?

ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method is better for a particular project, and I'm certain the answer is…

machine-learning neural-network classification svm

asked Jul 24 '12 at 13:59

Channel72

24,139
32
108
180

254

votes

6 answers

Save classifier to disk in scikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data? I have the following sample program from the scikit-learn website: from sklearn import datasets iris = datasets.load_iris() from sklearn.naive_bayes import…

python machine-learning scikit-learn classification

asked May 15 '12 at 00:06

garak

4,713
9
39
56

203

votes

20 answers

Difference between classification and clustering in data mining?

Can someone explain what the difference is between classification and clustering in data mining? If you can, please give examples of both to understand the main idea.

machine-learning classification cluster-analysis data-mining terminology

asked Feb 21 '11 at 10:39

Kristaps

2,047
2
14
5

136

votes

6 answers

Why is the F-Measure a harmonic mean and not an arithmetic mean of the Precision and Recall measures?

When we calculate the F-Measure considering both Precision and Recall, we take the harmonic mean of the two measures instead of a simple arithmetic mean. What is the intuitive reason behind taking the harmonic mean and not a simple average?

machine-learning classification data-mining

asked Oct 14 '14 at 08:22

London guy

27,522
44
121
179

123

votes

9 answers

How to fix RuntimeError "Expected object of scalar type Float but got scalar type Double for argument"?

I'm trying to train a classifier via PyTorch. However, I am experiencing problems with training when I feed the model with training data. I get this error on y_pred = model(X_trainTensor): RuntimeError: Expected object of scalar type Float but got…

python neural-network deep-learning classification pytorch

asked Jun 24 '19 at 17:05

Shawn Zhang

1,719
2
14
20

119

votes

20 answers

Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

My problem: I have a dataset which is a large JSON file. I read it and store it in the trainList variable. Next, I pre-process it - in order to be able to work with it. Once I have done that I start the classification: I use the kfold cross…

python machine-learning scikit-learn classification supervised-learning

asked Jul 09 '15 at 17:19

Euskalduna

1,517
2
13
12

113

votes

6 answers

scikit-learn .predict() default threshold

I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability. In a binary classification problem, is scikit's classifier.predict() using 0.5 by default? If it doesn't, what's the default…

python machine-learning scikit-learn classification imbalanced-data

asked Nov 14 '13 at 18:00

ADJ

4,892
10
50
83

votes

10 answers

Higher validation accuracy, than training accurracy using Tensorflow and Keras

I'm trying to use deep learning to predict income from 15 self reported attributes from a dating site. We're getting rather odd results, where our validation data is getting better accuracy and lower loss, than our training data. And this is…

tensorflow machine-learning neural-network keras classification

asked May 15 '17 at 12:22

Jasper

1,018
1
10
14

votes

5 answers

Scikit-learn train_test_split with indices

How do I get the original indices of the data when using train_test_split()? What I have is the following from sklearn.cross_validation import train_test_split import numpy as np data = np.reshape(np.randn(20),(10,2)) # 10 training examples labels =…

python scipy scikit-learn classification

asked Jul 20 '15 at 16:03

CentAu

10,660
15
59
85

votes

13 answers

How can I build a model to distinguish tweets about Apple (Inc.) from tweets about apple (fruit)?

See below for 50 tweets about "apple." I have hand labeled the positive matches about Apple Inc. They are marked as 1 below. Here are a couple of lines: 1|“@chrisgilmer: Apple targets big business with new iOS 7 features http://bit.ly/15F9JeF ”.…

python machine-learning classification

asked Jun 27 '13 at 20:20

SAL

votes

5 answers

What is the relation between the number of Support Vectors and training data and classifiers performance?

I am using LibSVM to classify some documents. The documents seem to be a bit difficult to classify as the final results show. However, I have noticed something while training my models. and that is: If my training set is for example 1000 around 800…

machine-learning classification svm libsvm

asked Feb 28 '12 at 10:57

Hossein

40,161
57
141
175

votes

5 answers

Use scikit-learn to classify into multiple categories

I'm trying to use one of scikit-learn's supervised learning methods to classify pieces of text into one or more categories. The predict function of all the algorithms I tried just returns one match. For example I have a piece of text: "Theaters in…

python classification scikit-learn

asked May 10 '12 at 01:59

CodeMonkeyB

2,970
4
22
29

votes

6 answers

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

I'm using scikit-learn in Python to develop a classification algorithm to predict the gender of certain customers. Amongst others, I want to use the Naive Bayes classifier but my problem is that I have a mix of categorical data (ex: "Registered…

python machine-learning data-mining classification scikit-learn

asked Jan 10 '13 at 09:08

user1499144

1,063
2
9
9

votes

11 answers

FailedPreconditionError: Attempting to use uninitialized in Tensorflow

I am working through the TensorFlow tutorial, which uses a "weird" format to upload the data. I would like to use the NumPy or pandas format for the data, so that I can compare it with scikit-learn results. I get the digit recognition data from…

python pandas classification tensorflow

asked Nov 30 '15 at 14:41

user3654387

2,240
4
19
20

2 3

…

99 100 Next