Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
279
votes
15 answers

What is the difference between linear regression and logistic regression?

When we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values. Then, what is the difference between the two…
London guy
  • 27,522
  • 44
  • 121
  • 179
209
votes
12 answers

Can someone give an example of cosine similarity, in a very simple, graphical way?

Cosine Similarity article on Wikipedia Can you show the vectors here (in a list or something) and then do the math, and let us see how it works?
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080
203
votes
20 answers

Difference between classification and clustering in data mining?

Can someone explain what the difference is between classification and clustering in data mining? If you can, please give examples of both to understand the main idea.
146
votes
8 answers

How does the Amazon Recommendation feature work?

What technology goes in behind the screens of Amazon recommendation technology? I believe that Amazon recommendation is currently the best in the market, but how do they provide us with such relevant recommendations? Recently, we have been involved…
Rachel
  • 100,387
  • 116
  • 269
  • 365
136
votes
6 answers

Why is the F-Measure a harmonic mean and not an arithmetic mean of the Precision and Recall measures?

When we calculate the F-Measure considering both Precision and Recall, we take the harmonic mean of the two measures instead of a simple arithmetic mean. What is the intuitive reason behind taking the harmonic mean and not a simple average?
London guy
  • 27,522
  • 44
  • 121
  • 179
132
votes
3 answers

Why does one hot encoding improve machine learning performance?

I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy, compared to using the original matrix…
maheshakya
  • 2,198
  • 7
  • 28
  • 43
120
votes
8 answers

What is an intuitive explanation of the Expectation Maximization technique?

Expectation Maximization (EM) is a kind of probabilistic method to classify data. Please correct me if I am wrong if it is not a classifier. What is an intuitive explanation of this EM technique? What is expectation here and what is being…
109
votes
6 answers

1D Number Array Clustering

So let's say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through similar questions yet most people suggested using k-means…
E.H.
  • 3,271
  • 4
  • 19
  • 18
78
votes
6 answers

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

I'm using scikit-learn in Python to develop a classification algorithm to predict the gender of certain customers. Amongst others, I want to use the Naive Bayes classifier but my problem is that I have a mix of categorical data (ex: "Registered…
76
votes
5 answers

What is the difference between Gradient Descent and Newton's Gradient Descent?

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plain gradient descent and the Newton's…
66
votes
3 answers

What information can we access from the client?

I'm trying to compile a list of information that is accessible via javascript such as: Geo-location IP address Browser software Exit location Entrance location I understand that a user can alter any of this information and that it's reliability is…
George Reith
  • 13,132
  • 18
  • 79
  • 148
58
votes
7 answers

PCA For categorical features?

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the following link: When to use One Hot Encoding vs…
data_person
  • 4,194
  • 7
  • 40
  • 75
53
votes
1 answer

Decision tree vs. Naive Bayes classifier

I am doing some research about different data mining techniques and came across something that I could not figure out. If any one have any idea that would be great. In which cases is it better to use a Decision tree and other cases a Naive Bayes…
Y2theZ
  • 10,162
  • 38
  • 131
  • 200
52
votes
11 answers

Calculate AUC in R?

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English? Page 9 of "AUC: a Better Measure..." seems to require knowing the class…
Andrew
  • 1,619
  • 3
  • 19
  • 24
50
votes
6 answers

How many principal components to take?

I know that principal component analysis does a SVD on a matrix and then generates an eigen value matrix. To select the principal components we have to take only the first few eigen values. Now, how do we decide on the number of eigen values that we…
London guy
  • 27,522
  • 44
  • 121
  • 179
1
2 3
99 100