Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

462

votes

8 answers

Cluster analysis in R: determine the optimal number of clusters

How can I choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be appropriate? How can I perform cluster dendro analysis? n = 1000 kk = 10 x1 = runif(kk) y1 = runif(kk) z1 =…

r cluster-analysis k-means

asked Mar 13 '13 at 02:39

user2153893

4,657
3
13
5

228

votes

10 answers

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

python machine-learning cluster-analysis k-means scikit-learn

asked Apr 03 '11 at 12:39

bmasc

2,410
2
15
9

203

votes

20 answers

Difference between classification and clustering in data mining?

Can someone explain what the difference is between classification and clustering in data mining? If you can, please give examples of both to understand the main idea.

machine-learning classification cluster-analysis data-mining terminology

asked Feb 21 '11 at 10:39

Kristaps

2,047
2
14
5

154

votes

20 answers

How do I determine k when using k-means clustering?

I've been studying about k-means clustering, and one thing that's not clear is how you choose the value of k. Is it just a matter of trial and error, or is there more to it?

cluster-analysis k-means

asked Nov 24 '09 at 22:58

Jason Baker

192,085
135
376
510

120

votes

8 answers

What is an intuitive explanation of the Expectation Maximization technique?

Expectation Maximization (EM) is a kind of probabilistic method to classify data. Please correct me if I am wrong if it is not a classifier. What is an intuitive explanation of this EM technique? What is expectation here and what is being…

machine-learning cluster-analysis data-mining mathematical-optimization expectation-maximization

asked Aug 04 '12 at 10:56

London guy

27,522
44
121
179

109

votes

6 answers

1D Number Array Clustering

So let's say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through similar questions yet most people suggested using k-means…

arrays cluster-analysis data-mining dimension partition-problem

asked Jul 16 '12 at 22:25

E.H.

3,271
4
19
18

votes

7 answers

Unsupervised clustering with unknown number of clusters

I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance between each other less than a threshold "T". I do not know how many…

algorithm math artificial-intelligence machine-learning cluster-analysis

asked Apr 13 '12 at 06:54

London guy

27,522
44
121
179

votes

18 answers

K-means algorithm variation with equal cluster size

I'm looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promising, but does not produce equally sized groups. Is there a variation of this…

algorithm dictionary cluster-analysis k-means

asked Mar 27 '11 at 21:27

pixelistik

7,541
3
32
42

votes

2 answers

plotting results of hierarchical clustering on top of a matrix of data

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is the following figure: This is Figure 6 from: A panel of induced pluripotent stem cells from chimpanzees: a…

python matplotlib scipy cluster-analysis dendrogram

asked Jun 06 '10 at 02:50

user248237

votes

3 answers

Scikit Learn - K-Means - Elbow - criterion

Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the right k but i do not understand how to use it with…

python machine-learning scikit-learn cluster-analysis k-means

asked Oct 05 '13 at 12:19

Linda

2,375
4
30
33

votes

8 answers

Python k-means algorithm

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.

python algorithm cluster-analysis k-means

asked Oct 09 '09 at 19:16

Eeyore

2,126
7
33
49

votes

6 answers

How to get the samples in each cluster?

I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it? Say I had 100 data points and KMeans gave me 5 cluster. Now I want to know which data points are in…

python scikit-learn cluster-analysis k-means

asked Mar 24 '16 at 07:56

user77005

1,769
4
18
26

votes

9 answers

scikit-learn: Predicting new points with DBSCAN

I am using DBSCAN to cluster some data using Scikit-Learn (Python 2.7): from sklearn.cluster import DBSCAN dbscan = DBSCAN(random_state=0) dbscan.fit(X) However, I found that there was no built-in function (aside from "fit_predict") that could…

machine-learning scikit-learn cluster-analysis data-mining dbscan

asked Jan 07 '15 at 15:27

slaw

6,591
16
56
109

votes

5 answers

Plot dendrogram using sklearn.AgglomerativeClustering

I'm trying to build a dendrogram using the children_ attribute provided by AgglomerativeClustering, but so far I'm out of luck. I can't use scipy.cluster since agglomerative clustering provided in scipy lacks some options that are important to me…

python plot cluster-analysis dendrogram

asked Mar 18 '15 at 16:07

Shukhrat Khannanov

votes

4 answers

kmeans: Quick-TRANSfer stage steps exceeded maximum

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20). I get the following error: Quick-TRANSfer stage steps exceeded maximum…

r cluster-analysis k-means

asked Jan 27 '14 at 13:55

Anna Dunietz

2 3

…

99 100 Next