Algorithm for shortest path from muptiple Sets

Question

This is an interview Question (I saw it on a forum and am not able to figure out the best solution). The problem is from a given Set of numbers find the shortest path.

eg.

Set A - [2, 14, 34]
Set B - [9, 13]
Set C - [15, 22, 62, 78]
Set D - [16, 24, 54]
Set Z - [17, 38, 41]

1) There can be any number of sets

2) The numbers inside the set will never repeat.

3) The numbers can range from any start to any end (they are not between 0 - n, i.e. It can start from 1091 to 1890 etc)

4) All the sets are sorted.

in the above example the path will be:

B[13] -> A[14] -> C[15] -> D[16] -> Z[17]

The shortest path is defined as the difference between MAX number (17) - MIN Number (13) = 4;

Any ideas ?

That is not the shortest, that is the longest path. Or what do you mean? — Igor Chubin, Jul 27 '12 at 10:54
The shortest is MAX[17] - MIN[13] is 4 and ALL NUMBERS are between MIN and MAX (no other combination in sets will give the same result) — Vivek, Jul 27 '12 at 11:02
"The numbers inside the set will never repeat." I need to clarify whether "the" here refers to "each" or "all". Other than this, it's still unclear to me what's meant by this question. Should I traverse ALL sets, taking only one value from each in increasing order and the first and last value when substracted gives the smallest result. Is that what you mean? — LeleDumbo, Jul 27 '12 at 11:07
@LeleDumbo (1) "for all" (2) we have to pick one element from each set. — Vivek, Jul 27 '12 at 11:37
How to define a path? Your question isn't clear. There must have to be some node, and connection between them. What are the nodes in your question? `from a given Set of numbers find the shortest path` - It's not possible, but from a given graph you can find shortest path. — Rupak, Jul 27 '12 at 11:42
In other words problem is to find closest elements from all of the sets, right ? — Rrr, Jul 27 '12 at 17:38

maniek · Answer 1 · 2012-07-27T13:50:53.093

3

make a list of pairs [number, name_of_set]. Sort it.

For a given length of path, D, scan the sorted list, keeping 2 pointers. Always increase the first pointer, and increase the second while the the spread is larger than D. While scanning, keep counts of elements between pointers belonging to each set. if there is element from each set, bingo, You found a path with difference at most D.

Now, binary search for D.

overall complexity O(N log N)

edited Jul 27 '12 at 13:50

answered Jul 27 '12 at 12:30

maniek

7,087
2
20
43

You need exactly one element from each set, how do you make sure of that? – IVlad Jul 27 '12 at 13:04
If a set contains two elements in the given range you can pick any one, the question doesn't specifically forbid that. – Generic Human Jul 27 '12 at 13:14
Nevermind, I misread the post. Shouldn't "increase the second while the the spread is larger than D." be "increase the second while the the spread is **not** larger than D"? Otherwise looks good, +1. – IVlad Jul 27 '12 at 13:25
@IVlad What I meant is whenever You increase the first pointer, the spread can get larger than D. Then You need to increase the second pointer, to make the spread within range. – maniek Jul 27 '12 at 13:53
How can that happen? Since the list is sorted, increasing the first pointer can only reduce the spread. Unless you mean the first pointer is **after** the second, which is confusing in my opinion. – IVlad Jul 27 '12 at 14:04
How do you know D upfront ? Local optimum wont give you ability to find global optimum. For example s1{10, 25}, s2{15, 31}, s3{1, 36} – Rrr Jul 27 '12 at 18:09
@RuslanDzhabbarov You don't. That's why You use binary search to find it – maniek Jul 27 '12 at 22:35
If you don't know what to find, what you are going to find using BS ? – Rrr Jul 27 '12 at 22:48
The whole 2-pointer-scan is INSIDE the binary search, i.e. D is the midpoint of the scan. Now that I think of it, you can get rid of the binary search by slightly modifying the scan. This is left as an exercise to the reader :) – maniek Jul 27 '12 at 23:31

score 1 · Answer 2 · answered Jul 29 '12 at 06:59

a heap (priority queue) might help.

merge sort all data into an array N, also keep original set id, assume there are m sets in total;
int shortest = MAX(N) - MIN(N); // that is N[N.length - 1] - N[0]
init a heap h;
loop through N with i, if h does not contain element from the same set as N[i], add N[i] to heap; if h already contains an element from the same set, say h[k], increase the key of h[k] to N[i]. If h.size() == m, shortest == N[i] - h[0] < shortest ? N[i] - h[0]: shortest.

here is code:

mergesort(all_sets[], H, S); // H holds all data, S holds corresponding setid.
Heap<key, setid> H = new Heap<key, setid>();
int shortest = N[N.length - 1] - n[0];
for(int i = 0; i < N.length; i++)
{
   int data = N[i];
   int setID = S[i];
   int hindex = H.elementFromSet(setID);
   if(hindex < 0)
    { // H does not have any element from set with setID;
       H.add(data, setID);
    } else {
       H.increase(data, hindex);
    }
    if(H.size() == m)
    {
       shortest = shortest > N[i] - H[0]? N[i] - H[0] : shortest;
    }
}

Maybe I can use a hashtable to keep tracking set id to heap index.

the runtime I believe is O(nlgm).

score 0 · Answer 3 · answered Jul 27 '12 at 11:11

Take Set A and Set B . Find shortest path in this set. This will be 14-13 . Now sort it , so that it becomes 13- 14. Now the short set short = {13,14}
Take short set { 13, 14} and set C {15,22,62,78}. Now start node is 13 and end node is 14 in the short set. Beginning from the end node 14 the shortest reachable path is 15. So add the 15 to the short set. Now short set becomes { 13, 14 , 15}, sort it so that it remains {13, 14, 15}
Now take short set {13,14,15} and set D { 16 , 24 , 54} The end node in short set is 15. So we begin from there. Now shortest path from 25 to set D is 16. So add 16 to the short set. Now short set becomes { 13,14,15,16}. Sort it . It remains {13,14,15,16}

3 . We can repeat this for the entire sets to get the resultant short set.

doesn't work. Try adding a 1 to B. Now You have [1,2] - how do You go from [1,2] to [13,14,15] ? — maniek, Jul 27 '12 at 12:22

score 0 · Answer 4 · edited May 23 '17 at 11:48

You can apply essentially the same idea as the algorithm I described in this question.

Let's look for the center of the final subset. It must minimize the maximum distance to each of the sets. As usual, the distance of a point to a set is defined as the minimum distance between the point and an element of the set.

For each set i, the function fi describing the distance to the set is piecewise linear. If a,b are two consecutive numbers, the relations fi(a) = 0, fi((a+b)/2) = (b-a)/2, fi(b) = 0 let us build a description of all the fi in linear time.

But we can also compute the maximum of two piecewise functions fi and fj in linear time, by considering the consecutive intervals [a,b] where they are linear: either the result is linear, or it is piecewise linear by adding the unique intersection point of the functions to the partition. Since the slopes of our functions are always +1 or -1, the intersection point is a half-integer so it can be represented exactly in floating-point (or fixed-point) arithmetic.

A convexity argument shows that the maximum g of all the fi can only have at most twice as many points as the fi, so we don't have to worry about the maximum having a number of points that would be exponential in the number of sets.

So we just:

Compute the piecewise linear distance function fi for i = 1..p.
Compute the maximum g of all the fi by repeatedly computing the maximum.
The location of any minimum point of g is the desired center.
For each set, pick the closest point to the center.
The width of the set of points we picked is exactly the minimum of g :-)

Complexity is O(N) if the number of sets is bounded, or O(N p) if the number of sets p is variable. By being smart about how you compute the maximum (divide-and-conquer), I think you can even reduce it to O(N log p).

score 0 · Answer 5 · answered Jul 28 '12 at 14:23

Here's an alternative formulation of the problem.

Q: Find the smallest interval which contains an element from all the sets.

A:

Put all the elements in a single bucket and sort them. Complexity O(N*K)
We will find the largest number such that there is at least one element from each set higher than this number by binary search. This will be MIN.
Similarly, find the smallest number such that there is at least one element from each set smaller than this number by binary search. This will be MAX.

As an optimization, you can store each set as an interval, and use an interval tree for queries in steps 2 and 3. That way the query complexity changes from O(K) to O(log K)

Algorithm for shortest path from muptiple Sets

5 Answers5