Functional programmers love recursion, yet they go to great lengths to avoid writing it. Jeez, people, make up your minds!
I like to structure my code, to the extent possible, using common, well-understood combinators. I want to demonstrate a style of Haskell programming which leans heavily on standard tools to implement the boring parts of a program (mapping, zipping, looping) as tersely and generically as possible, freeing you up to focus on the problem at hand.
So don't worry if you don't understand everything here. I just want to show you what's possible! (And please ask if you have questions!)
Vectors
First things first: we're working with two-dimensional space, so we'll need two-dimensional vectors and some secondary school vector algebra to work with them.
I'm going to parameterise my vector by the scalar on which our vector space is built. This'll allow me to work with standard type classes like Functor
, so I can delegate a lot of the work of building a vector algebra to the machine. I've turned on DeriveFunctor
and DeriveFoldable
, which allow me to utter the magic words deriving (Functor, Foldable)
.
data Pair a = Pair {
px :: a,
py :: a
} deriving (Show, Functor, Foldable)
Hereafter I'm going to avoid working explicitly with Pair
, and program to an interface, not an implementation. This'll allow me to build a simple linear algebra library in a manner that's independent of the dimensionality of the vector space. I'll give example type signatures in terms of V2
:
type V2 = Pair Double
Scalar multiplication: functors
A vector space is required to have two operations: scalar multiplication and vector addition. Scalar multiplication means multiplying each component of a vector by a constant scalar. If you view a vector as a container of components, it should be clear that this means "do the same thing to every element in a container" - that is, it's a mapping operation. That's what Functor
is for.
-- mul :: Double -> V2 -> V2
mul :: (Functor f, Num n) => n -> f n -> f n
mul k f = fmap (k *) f
Vector addition: zippy applicatives
Vector addition involves adding up the components of a vector point-wise. Thinking of a vector as a container of components, addition is a zipping operation - match up each element of the two vectors and add them up.
Applicative functors are functors with an additional "apply" operation. Thinking of a functor f
as a container, Applicative
's <*> :: f (a -> b) -> f a -> f b
gives you a way to take a container of functions and apply it to a container of values to get a new container of values. It should be clear that one way to make Pair
into an Applicative
is to use zipping to apply functions to values.
instance Applicative Pair where
pure x = Pair x x
Pair f g <*> Pair x y = Pair (f x) (g y)
(For another example of a zippy applicative, see this answer of mine.)
Now that we have a way to zip two pairs, we can leverage a bit of standard Applicative
machinery to implement vector addition.
-- add :: V2 -> V2 -> V2
add :: (Applicative f, Num n) => f n -> f n -> f n
add = liftA2 (+)
Vector subtraction, which gives you a way to find the distance between two points, is defined in terms of multiplication and addition.
-- minus :: V2 -> V2 -> V2
minus :: (Applicative f, Num n) => f n -> f n -> f n
v `minus` u = v `add` mul (-1) u
Dot products: foldable containers
2D Euclidean space is actually a Hilbert space - a vector space equipped with a way to measure lengths and angles in the form of a dot product. To take the dot product of two vectors, you multiply the components together and then add up the results. Once more, we'll be using Applicative
to multiply the components, but that just gives us another vector: how do we implement "adding up the results"?
Foldable
is the class of containers which admit an "aggregation" operation foldr :: (a -> b -> b) -> b -> f a -> b
. The standard prelude's sum
is defined in terms of foldr
, so:
-- dot :: V2 -> V2 -> Double
dot :: (Applicative f, Foldable f, Num n) => f n -> f n -> n
v `dot` u = sum $ liftA2 (*) v u
This gives us a way to find the absolute length of a vector: dot it with itself and take the square root.
-- modulus :: V2 -> Double
modulus :: (Applicative f, Foldable f, Floating n) => f n -> n
modulus v = sqrt $ v `dot` v
So the distance between two points is the modulus of the difference of the vectors.
dist :: (Applicative f, Foldable f, Floating n) => f n -> f n -> n
dist v u = modulus (v `minus` u)
N-ary zipping: traversable containers
An axis-aligned (hyper-)rectangle can be defined by just two points. We'll represent the bounding box of a set of points as a Pair
of vectors pointing to opposite corners of the bounding box.
Given a collection of vectors of components, we can find the opposite corners of the bounding box by finding the maximum and minimum of each component across the collection. This requires us to zip up, or transpose, a collection of vectors of components into a vector of collections of components. For this I'll use Traversable
's sequenceA
.
-- boundingBox :: [V2] -> Pair V2
boundingBox :: (Traversable t, Applicative f, Ord n) => t (f n) -> Pair (f n)
boundingBox vs =
let components = sequenceA vs
in Pair (minimum <$> components) (maximum <$> components)
Clustering
Now that we have a library for working with vectors, we can get down to the meaty part of the algorithm: dividing sets of points into clusters.
Partitioning
Let me rephrase the specification of the inner loop of your algorithm. You want to partition a set of points based on whether they're closer to the bottom-left corner of the set's bounding box or to the top-right corner. That's what partition
does.
We can write a function, whichCluster
which uses minus
and modulus
to decide this for a single point, and then use partition
to apply it to the whole set.
type Cluster = []
-- cluster :: Cluster V2 -> [Cluster V2]
cluster :: (Applicative f, Foldable f, Ord n, Floating n) => Cluster (f n) -> [Cluster (f n)]
cluster vs =
let Pair bottomLeft topRight = boundingBox vs
whichCluster v = dist v bottomLeft <= dist v topRight
(g1, g2) = partition whichCluster vs
in [g1, g2]
Repetition, repetition, repetition
Now we want to repeatedly cluster
until we don't have any groups larger than 5. Here's the plan. We'll keep track of two sets of clusters, those which are small enough, and those which require further sub-clustering. I'll use partition
to sort a list of clusters into those which are small enough and those which need subclustering. I'll use the list monad's >>= :: [a] -> (a -> [b]) -> [b]
(here [Cluster V2] -> ([V2] -> [Cluster V2]) -> [Cluster V2]
), which maps a function over a list and flattens the result, to implement the notion of subclustering. And I'll use until
to repeatedly subcluster until the set of remaining too-large clusters is empty.
-- smallClusters :: Int -> Cluster V2 -> [Cluster V2]
smallClusters :: (Applicative f, Foldable f, Ord n, Floating n) => Int -> Cluster (f n) -> [Cluster (f n)]
smallClusters maxSize vs = fst $ until (null . snd) splitLarge ([], [vs])
where
smallEnough xs = length xs <= maxSize
splitLarge (small, remaining) =
let (newSmall, large) = partition smallEnough remaining
in (small ++ newSmall, large >>= cluster)
A quick test, cribbed from @user2407038's answer:
testPts :: [V2]
testPts = map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> smallClusters 5 testPts
[
[Pair {px = 0.0, py = 0.0},Pair {px = 1.0, py = 0.0},Pair {px = 2.0, py = 1.0},Pair {px = 0.0, py = 2.0}],
[Pair {px = 5.0, py = 2.0},Pair {px = 5.0, py = 4.0},Pair {px = 4.0, py = 3.0},Pair {px = 4.0, py = 4.0}],
[Pair {px = 8.0, py = 2.0},Pair {px = 9.0, py = 3.0},Pair {px = 10.0, py = 2.0}]
[Pair {px = 11.0, py = 4.0},Pair {px = 12.0, py = 3.0},Pair {px = 13.0, py = 3.0},Pair {px = 13.0, py = 5.0}]
]
There you go. Small clusters in n-dimensional space, all without a single recursive function.
Labelling
Part of the point of working with the Applicative
and Foldable
interfaces, rather than working with V2
directly, was so I could demonstrate the following little magic trick.
Your original code represented points as 3-tuples consisting of two Double
s for the location and an Int
for the point's label, but my V2
has no label. Can we recover this? Well, since the code doesn't at any point mention any concrete types - just standard type classes - we can just build a new type for labelled vectors. As long as said type is a Foldable
Applicative
all of the above code will continue to work without modification!
data Labelled m f a = Labelled m (f a) deriving (Show, Functor, Foldable)
instance (Monoid m, Applicative f) => Applicative (Labelled m f) where
pure = Labelled mempty . pure
Labelled m ff <*> Labelled n fx = Labelled (m <> n) (ff <*> fx)
The Monoid
constraint is there because when combining actions you also need a way to combine their labels. I'm just going to use First
- left-biased choice - because I'm not expecting the points' labels to be relevant to the zipping operations like modulus
and boundingBox
.
type LabelledV2 = Labelled (First Int) Pair Double
testPts :: [LabelledV2]
testPts = zipWith (Labelled . First . Just) [0..] $ map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> traverse (traverse (getFirst . lbl)) $ smallClusters 5 testPts
Just [[0,1,2,3],[4,5,6,7],[8,9,10],[11,12,13,14]] -- try reordering testPts