Mostly code, implementing the algorithm in http://en.wikipedia.org/wiki/Geometric_median#Computation
and gives you an example of use based on a set of random points.
NB: this is for points in a plane, because I can't decide how two spherical coordinates have to be summed... hence you have to map the spherical coordinates with a planar projection beforehand, but this point has been already touched in a previous answer.
code
from math import sqrt
from random import seed, uniform
from operator import add
seed(100)
class Point():
"""Very basic point class, supporting "+", scalar "/" and distances."""
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return "("+repr(self.x)+","+repr(self.y)+")"
def __add__(self, P):
return Point(self.x+P.x, self.y+P.y)
def __div__(self, scalar):
return Point(self.x/float(scalar), self.y/float(scalar))
def delta(self, P):
dx = self.x - P.x
dy = self.y - P.y
return sqrt(dx*dx+dy*dy)
def iterate(GM,points):
"Simple implementation of http://en.wikipedia.org/wiki/Geometric_median#Computation"
# distances from the tentative GM
distances = [GM.delta(p) for p in points]
normalized_positions = [p/d for p,d in zip(points,distances)]
normalization_factor = sum(1.0/d for d in distances)
new_median = reduce(add, normalized_positions)/normalization_factor
return new_median
# The "clients"
nclients = 10
points = [Point(uniform(-3,3),uniform(-3,3)) for i in range(nclients)]
# Centroid of clients and total of distances
centroid = reduce(add,points)/nclients
print "Position of centroid:",centroid
print "Sum of distances from centroid:",
print reduce(add,[centroid.delta(p) for p in points])
print
print "Compute the Geometric Median using random starting points:"
nstart = 10
for P0 in [Point(uniform(-5,5),uniform(-5,5)) for i in range(nstart)]:
p0 = P0
for i in range(10):
P0 = iterate(P0, points)
print p0,"-->",P0
print
print "Sum of distances from last Geometric Median:",
print reduce(add,[P0.delta(p) for p in points])
output
Position of centroid: (-0.4647467432024398,0.08675910209912471)
Sum of distances from centroid: 22.846445119
Compute the Geometric Median using random starting points:
(1.2632163919279735,4.633157837008632) --> (-0.8739691868669638,-0.019827884361901298)
(-2.8916600791314986,4.561006461166512) --> (-0.8929310891388812,-0.025857080003665663)
(0.5539966580106901,4.011520429873922) --> (-0.8764828849474395,-0.020607834485528134)
(3.1801819335743033,-3.395781900250662) --> (-0.8550062003820846,-0.014134334529992666)
(1.48542908120573,-3.7590671941155627) --> (-0.8687797019011291,-0.018241177226221747)
(-4.943549141082007,-1.044838193982506) --> (-0.9066276248482427,-0.030440865315529194)
(2.73500702168781,0.6615770729288597) --> (-0.8231318436739281,-0.005320464433689587)
(-3.073593440129266,3.411747144619733) --> (-0.8952513352350909,-0.026600471220747438)
(4.137768422492282,-2.6277493707729596) --> (-0.8471586848200597,-0.011875801531868494)
(-0.5180751681772549,1.377998063140823) --> (-0.8849056106235963,-0.02326386487180884)
Sum of distances from last Geometric Median: 22.7019120091
my own comment
In this case the locations (centroid vs GM) are quite different, but the results are similar. I expect significant differences in both the locations and the mean distances when you have some sort of clustering around a point (a city), or about some features say a line (a road) etc
Eventually, one can speed up things using numpy
, I've avoided doing numpy
due to limited time resources :)