Given an NxM feature vectors as numpy matrix. Is there any routine that can cluster it by Kmeans algorithm using L1 distance (Manhattan distance)?
4 Answers
Here is one Kmeans algorithm using L1 distance (Manhattan distance). For generality,the feature vector is represented as a list, which is easy to convert to a numpy matrix.
import random
#Manhattan Distance
def L1(v1,v2):
if(len(v1)!=len(v2):
print “error”
return -1
return sum([abs(v1[i]-v2[i]) for i in range(len(v1))])
# kmeans with L1 distance.
# rows refers to the NxM feature vectors
def kcluster(rows,distance=L1,k=4):# Cited from Programming Collective Intelligence
# Determine the minimum and maximum values for each point
ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) for i in range(len(rows[0]))]
# Create k randomly placed centroids
clusters=[[random.random( )*(ranges[i][1]-ranges[i][0])+ranges[i][0] for i in range(len(rows[0]))] for j in range(k)]
lastmatches=None
for t in range(100):
print 'Iteration %d' % t
bestmatches=[[] for i in range(k)]
# Find which centroid is the closest for each row
for j in range(len(rows)):
row=rows[j]
bestmatch=0
for i in range(k):
d=distance(clusters[i],row)
if d<distance(clusters[bestmatch],row):
bestmatch=i
bestmatches[bestmatch].append(j)
## If the results are the same as last time, this is complete
if bestmatches==lastmatches:
break
lastmatches=bestmatches
# Move the centroids to the average of their members
for i in range(k):
avgs=[0.0]*len(rows[0])
if len(bestmatches[i])>0:
for rowid in bestmatches[i]:
for m in range(len(rows[rowid])):
avgs[m]+=rows[rowid][m]
for j in range(len(avgs)):
avgs[j]/=len(bestmatches[i])
clusters[i]=avgs
return bestmatches

- 51
- 1
- 3
I don't think this is offered explicitly in scipy, but you should take a look at the following:

- 66,734
- 27
- 141
- 140
-
2'The requested URL /scipy/ticket/612 was not found on this server.' – Cecilia Aug 01 '19 at 19:49
-
URL is invalid. – Jonathan R Dec 10 '19 at 14:14
There's code under is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means, which uses any of the 20-odd metrics in scipy.spatial.distance. See also L1-or-L.5-metrics-for-clustering; could you comment on your results with L1 vs. L2 ?
Take a look at pyclustering. Here you can find an implementation of k-means that can be configured to use the L1 distance. But you have to convert the numpy array into a list.
how to install pyclustering
pip3 install pyclustering
a code snippet copied from pyclustering
pip3 install pyclustering
from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)
manhattan_metric = distance_metric(type_metric.MANHATTAN)
kmeans_instance = kmeans(sample, initial_centers, metric=manhattan_metric)
kmeans_instance.process()

- 53
- 8