kmeans with L1 distance in python

Question

Given an NxM feature vectors as numpy matrix. Is there any routine that can cluster it by Kmeans algorithm using L1 distance (Manhattan distance)?

junwangbuaa · Answer 1 · 2012-04-19T02:17:05.633

Here is one Kmeans algorithm using L1 distance (Manhattan distance). For generality,the feature vector is represented as a list, which is easy to convert to a numpy matrix.

    import random
    #Manhattan Distance
    def L1(v1,v2):
      if(len(v1)!=len(v2):
        print “error”
        return -1
      return sum([abs(v1[i]-v2[i]) for i in range(len(v1))])

    # kmeans with L1 distance. 
    # rows refers to the NxM feature vectors
    def kcluster(rows,distance=L1,k=4):# Cited from Programming Collective Intelligence 
        # Determine the minimum and maximum values for each point
        ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) for i in range(len(rows[0]))]

        # Create k randomly placed centroids
        clusters=[[random.random( )*(ranges[i][1]-ranges[i][0])+ranges[i][0] for i in range(len(rows[0]))] for j in range(k)]

        lastmatches=None
        for t in range(100):
            print 'Iteration %d' % t
            bestmatches=[[] for i in range(k)]
            # Find which centroid is the closest for each row
            for j in range(len(rows)):
                row=rows[j]
                bestmatch=0
                for i in range(k):
                    d=distance(clusters[i],row)
                    if d<distance(clusters[bestmatch],row): 
                        bestmatch=i
                bestmatches[bestmatch].append(j)
            ## If the results are the same as last time, this is complete
            if bestmatches==lastmatches:
                break
            lastmatches=bestmatches

            # Move the centroids to the average of their members
            for i in range(k):
                avgs=[0.0]*len(rows[0])
                if len(bestmatches[i])>0:
                    for rowid in bestmatches[i]:
                        for m in range(len(rows[rowid])):
                            avgs[m]+=rows[rowid][m]
                    for j in range(len(avgs)):
                        avgs[j]/=len(bestmatches[i])
                    clusters[i]=avgs
        return bestmatches

score 1 · Accepted Answer · answered Jun 06 '11 at 14:48

1

I don't think this is offered explicitly in scipy, but you should take a look at the following:

http://projects.scipy.org/scipy/ticket/612

answered Jun 06 '11 at 14:48

JoshAdel

66,734
27
141
140

2

'The requested URL /scipy/ticket/612 was not found on this server.' – Cecilia Aug 01 '19 at 19:49
URL is invalid. – Jonathan R Dec 10 '19 at 14:14

score 1 · Answer 3 · edited May 23 '17 at 12:09

1

There's code under is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means, which uses any of the 20-odd metrics in scipy.spatial.distance. See also L1-or-L.5-metrics-for-clustering; could you comment on your results with L1 vs. L2 ?

edited May 23 '17 at 12:09

Community

1
1

answered Jun 12 '11 at 09:51

denis

21,378
10
65
88

Marian Lux · Answer 4 · 2021-06-06T18:20:28.763

Take a look at pyclustering. Here you can find an implementation of k-means that can be configured to use the L1 distance. But you have to convert the numpy array into a list.

how to install pyclustering

pip3 install pyclustering

a code snippet copied from pyclustering

pip3 install pyclustering

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

manhattan_metric = distance_metric(type_metric.MANHATTAN)
kmeans_instance = kmeans(sample, initial_centers, metric=manhattan_metric)
kmeans_instance.process()

Another alternative is [kmedoids](https://pypi.org/project/kmedoids/) — HTC, Oct 27 '21 at 13:47

kmeans with L1 distance in python

4 Answers4