0

So, I am trying to solve this sku segregation and routing problem for delivery. Following is the situation:

There is a single area hub or starting point for delivery,

Input:

  1. Geo address of customers - A (latitude, longitude) pair.
  2. Input order quantity per customer.

Problem:

  1. I want to make routes(loc1 --> loc2 --> loc7 -->... --> loc n) of not more than 50 locations each for one guy to deliver.
  2. I want to cluster those routes so that I know the quantity of SKUs to be dispatched to an area.

I tried using kmeans & hdbscan but it does not honour maximum cluster size.

Can I extrapolate this smart solution somehow to work in my case, mine seems more hierarchical to me.

kaulmonish
  • 408
  • 5
  • 17
  • 1
    This is the [Vehicle Routing Problem](https://en.wikipedia.org/wiki/Vehicle_routing_problem). It's even harder to solve exactly than its more famous cousin, the Travelling Salesman Problem. – j_random_hacker May 16 '18 at 11:01
  • @j_random_hacker yeah, seems like.... I tried to take a shot via clustering but didn't succeed much. Checking existing work on this. Will keep the thread updated.. – kaulmonish May 16 '18 at 11:06

1 Answers1

1

Claim: in a 2D space, a set of points belong the same cluster, if (and only if) BOTH their projections on OX are clustered, and their projections on OY are clustered. enter image description here

on OX, the projections of A,B,C,D are clustered; on OY, the projections of A,B,C,E are clustered; overall, A,B and C are clustered.

Determining which points are clustered, on a 1D space:

Let us have some points spread on the real axis. Intuitively, two points belong the same cluster if they are separated by a 'small distance'; if they are separated by a 'large distance', they belong different clusters. Now we must determine what is a 'large distance' and what is a 'small distance'. enter image description here

Let us sort all the distances, and look for a threshold. For the set of points with OX coordinates being [0,1,3,4,14,15,16,19,29,30,31], the distances between consecutive points will be [1,2,1,10,1,1,3,10,1,1]. The same set of distances, when sorted, will look like [1,1,1,1,1,1,2,3,10,10]. There is a threshold between 3 and 10, so we shall denote all distances <=3 as 'small distances', and all others as 'large distances'.

How do we pick the threshold? By doing the difference between two neighboring elements. In our example, 10-3=7 is the larges difference.

If there are multiple thresholds of equal values, pick the right-most; picking the right most threshold results in having few 'large distances', otherwise, many of the distances will be considered large, and, accordingly ,there will be many clusters. But it is up to your business requirements. You can group 30 locations as 3 clusters each of 10 locations, or 10 clusters of 3 locations each.

If the sorted distances are something like [2,4,6,8,10] (no candidate for threshold), then pick some percentile like the 75-percentile: the top quarter will be 'large distance', all the others - 'small distances'

Grouping points into clusters, according to their projections on OX:

Now that we know how to cluster real numbers, let us take the OX projections of the points, and cluster them.

Let us have the following points, along with their OX coordinates: P1(101), P2(12), P3(201), P4(13), P5(202), P6(11), P7(102);

Same points, sorted by their OX coordinate: P6(11), P2(12), P4(13), P1(101), P7(102), P3(201), P5(202);

Next, we shall do another grouping, by the OY projections.

Observation 1: When the point's indexes are sorted, the OX projections are not; when we sort the OX projections, now the indexes will not be sorted.

Observation 2: when doing the clustering using the OX projections, we should not expect to obtain the same clusters as when we cluster using the OY projections. In fact, we will intersect the obtained results. The clustering results are different because a point's OY coordinate is totally independent from its OX coordinate. Thus, a totally different set of values on the OY axis.

Determining the actual clusters:

We have previously done some clustering on both the OX and OY projections, obtaining different groupings of the same points. Now we will intersect these clusters, looking for common points.

Coming back to our first picture, clustering after OX gives (A,B,C,D) cluster, after OY we get (A,B,C,E), the intersection of these sets will be (A,B,C) - the final cluster. But this was a simple example.

The general strategy is to do a carthesian product of the clusters on OX, with the clusters obtained on OY. For each such element of the carthesian product, we will intersect the elements in the OX cluster, with the elements on the OY cluster. If we got 3 clusters on OX and 4 clusters on OY, the carthesian product will have 12 elements. Let's pick one of them. It is a pair of two clusters A and B: A is a cluster on OX, B is a cluster on OY. If A and B have some points in common, then these points are indeed a cluster. enter image description here

In the above example, from 7 points, we have obtained only one single cluster. Not impressive. But we can further join neighbor clusters. Let's not forget that the orange segments represent 'small distances', while the black segments represent 'large distances'. In the above picture, from the 'cluster' P1 to the clusters P5 or P3+P7, there is only one 'large distance'. From the cluster P3+P7, to the cluster P4, there are two large distances plus one small distance.

disclaimer: this procedure treats the world map as a rectangle (meridians are parallel lines which will never intersect). Also, instead of viewing the meridians of 1 degree and 179 degrees as close to each other, they will appear really far apart. (the distance between them will be 178 degrees, instead of 2 degrees) However, this is not an issue, because most of the time, the act of delivery has a regional nature, or is, at most, country-level. As long as your country is not traversed by the 180 meridian, you are just fine.

Newton fan 01
  • 484
  • 5
  • 14
  • You also assume that one degree latitude = one degree longitude. But consider New York, the error is already 30% or so. The 180 degree wrap-around isn't the only issue with treating latitude and longitude as xy coordinates. – Has QUIT--Anony-Mousse May 27 '18 at 07:30
  • Also you are very optimistic that not all of X is dense. But consider you have a major road running east-west where deliveries cluster, your entire x axis will be one big "cluster" and you didn't gain anything. – Has QUIT--Anony-Mousse May 27 '18 at 07:33
  • regarding 1st comment: I have defined the notions of 'small distance' and 'big distance', with a threshold separating them. While I calculate these thresholds on both OX and OY, I never assumed they were equal. In fact, I do expect different values for the thresholds on OX and OY. The difference between these thresholds can be far larger than 30% . Perhaps that's a downside of the algorithm, as it would rather join two points whose distance on OY is 2km, instead of joining two points whose distance on OX is 1km, as long as 2km is a small distance on OY, while 1 km is a large distance on OX – Newton fan 01 May 29 '18 at 09:04
  • regarding 2nd comment: NOT AT ALL, there will not be one single cluster. My second picture shows that on OX there will be 3 clusters. On OY, obviously, there will be one single cluster. When doing the carthesian product, there will be 3 sets. The final clusters will be the intersection of a cluster on OX, with the cluster on OY (which is all the points); such intersection will be identical to the cluster on OX; so, 3 clusters in the end, not one – Newton fan 01 May 29 '18 at 09:07
  • I have skipped your point about the density. But i have treated that aspect, already: <> – Newton fan 01 May 29 '18 at 09:12
  • In the example I outlined, there is no separation between "small" and "large" distances. The data would be approximately uniform on x. So one would need to use the percentile, and that would just produce random partitions. A test case would be points uniformly along a sine curve, with uniform marginals. There is a good reason why grid based approaches are not used much in literature. – Has QUIT--Anony-Mousse May 29 '18 at 18:35