7

If you have a set of ranges, such as the following simple example...

[
    [12, 25], #1
    [14, 27], #2
    [15, 22], #3
    [17, 21], #4
    [20, 65], #5
    [62, 70], #6
    [64, 80]  #7
]

... how do you compute the maximally intersecting subset (not sure quite how to phrase it, but I mean "the subset of ranges which intersects and has the highest cardinality") and determine the degree of intersection (cardinality of ranges in that subset)?

Logically I can work it out, and might be able to translate that to a naive algorithm. Going down the list, we see that 1-5 intersect, and 5-7 intersect, and that #5 intersects both sets.

The result I want is simply the subset, since that gives me the information about the cardinality, and I can easily compute the intersection of the set as long as they all intersect. In the above example, it would be [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]].

Off the top of my head, I might try converting each range to a graph node, connecting the ones which are intersecting, and finding the largest fully-connected graph.

I was also thinking iteratively to start at the beginning, continue building up a list of intersecting ranges with a running intersection on each to check against—until you hit an element which doesn't intersect, then start a new list. Continue checking each item against the existing intersections. However I'm not sure this is complete.

I could take a stab at implementing something (lang is ruby FWIW), but I would love to hear how others might solve this problem, and what the most efficient and elegant way might be.

Update:

I believe this is a specific case of the Maximum clique problem, which is NP-hard and thus actually difficult. Suggestions for approximations/real-world use would be most appreciated!

See also: http://en.wikipedia.org/wiki/Maximum_clique / Find all complete sub-graphs within a graph

Update 2

Found a nice proof of this problem's NP-hardness and NP-completeness here: http://www.cs.bris.ac.uk/~popa/ipl.pdf

Looks like this is the end of the line then. Sorry folks! I'll work with a good-enough greedy approximation. Thanks.

As said in the answers I don't think that paper describes this problem... we probably have more information here based on the ranges.

Community
  • 1
  • 1
trisweb
  • 8,814
  • 3
  • 32
  • 22
  • Last semester in a course on efficient algorithms and data structures, we did this using an augmented AVL tree; the idea is to order the starting and end points of the intervals, put those points into the tree and for each point also store the information whether it is a starting or end point. You might take a look at what is called "interval tree". – G. Bach Feb 21 '13 at 23:23
  • Thanks - that is very interesting. That looks like it works great to find all intersecting ranges within the whole set, but I think what I'm trying to solve is a specific case of the Clique problem: http://en.wikipedia.org/wiki/Clique_problem – trisweb Feb 21 '13 at 23:28
  • 1
    As a side note, "finding the largest fully-connected graph" is probably another formulation for finding the largest clique in a graph, which is an NP-complete problem. – G. Bach Feb 21 '13 at 23:28
  • @G.Bach: you got it, I think it is NP complete, and thus actually difficult. So, let's talk real world and approximations :) – trisweb Feb 21 '13 at 23:30
  • @trisweb Let me see if I understood your problem correctly: you have a set of intervals, and you want to find some interval (not necessarily in your set) which intersects with the biggest number of your intervals? Given your example, that would be [20,21] since intervals 1-5 intersect there, correct? – G. Bach Feb 21 '13 at 23:35
  • @trisweb: do you mean you want a subset where each pair of ranges in the subset has non-empty intersection? That would indeed be a clique problem. – Fred Foo Feb 21 '13 at 23:48
  • @larsmans - not just each pair of ranges has a non-empty intersection, but the entire *set* of ranges together has a non-empty intersection. – trisweb Feb 21 '13 at 23:50
  • 1
    This can be formulated as a clique problem, which will give you more trouble than the problem actually is; however you can do it in polynomial time. I'll see whether there is a paper regarding this or not, the solution I have is I'm afraid copyrighted. – G. Bach Feb 21 '13 at 23:51
  • The most useful algorithm is probably Bron–Kerbosch - http://en.wikipedia.org/wiki/Bron%E2%80%93Kerbosch_algorithm – trisweb Feb 21 '13 at 23:54
  • @larsmans He probably meant the specific implementation code is not allowed to be copied; the algorithm itself might be, if generalized or re-written. – trisweb Feb 21 '13 at 23:55
  • Here's a solution to that problem giving you the point of maximum overlap; it should be easily adapted to give you the number of intervals intersecting there as well as the start of the interval you're interested in: www.comp.nus.edu.sg/~sma5503/handouts/ps4Sol.ps – G. Bach Feb 22 '13 at 00:02
  • Careful, the maximAL clique problem is not the same as the maximUM clique problem; the former searches for cliques that are maximal, meaning no other vertex not in it can be added to it without breaking the clique property, the latter searches for the largest clique in the graph. – G. Bach Feb 22 '13 at 00:07
  • @G.Bach thanks for your help - I did get a MaxiMUM clique brute-force solution working but looking to implement the other solution below for efficiency... – trisweb Feb 22 '13 at 14:23

2 Answers2

17

If I understand the problem correctly, it is not an instance of the NP problem described in the paper you linked to. Here is my understanding of the problem, and a polynomial-time solution.

  1. We are given a finite set of ranges of real numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai ≤ Bi.

  2. Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point.

In your example, this would be: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-

  1. Initialize curOverlap and maxOverlap to zero.

  2. Iterate through the list, incrementing curOverlap for each + and decrementing it for each -. Set maxOverlap = max(curOverlap, maxOverlap) on each increment.

To continue your example:
val, cur, max
12, 1, 1
14, 2, 2
15, 3, 3
17, 4, 4
20, 5, 5
21, 4, 5
22, 3, 5
25, 2, 5
27, 1, 5
62, 2, 5
64, 3, 5
65, 2, 5
70, 1, 5
80, 0, 5

The max overlap is 5. You could also store the val associated with the max if you wanted to know where the max overlap occurred. That would give you 20. in this example. It's then trivial to go through the initial set of ranges and find the 5 which include 20.

-edit- If you have repeated values, count the plusses before the minuses for each value so that you include ranges that overlap at a single point.

MC Emperor
  • 22,334
  • 15
  • 80
  • 130
Dave
  • 7,460
  • 3
  • 26
  • 39
  • 1
    The OP seems to be talking about the subset of input ranges that intersect and not the intersection itself. – Vaughn Cato Feb 25 '13 at 02:46
  • 3
    I know. Once you have an element from a maximal intersection, you can go through the ranges in linear time and identify which ones contain said element. – Dave Feb 25 '13 at 12:14
  • I think you are right -- you just need to show that if a set of ranges have pairwise intersection with each other, that the combined intersection is non-empty. – Vaughn Cato Feb 25 '13 at 14:13
  • I will implement and test! Thank you. – trisweb Feb 25 '13 at 14:44
  • Also I believe the proof lies somewhere in that we're finding the *point* of maximum intersection, and all ranges which intersect at the same point necessarily intersect with each other. – trisweb Feb 25 '13 at 15:57
  • Yep, this is the one, and makes perfect sense as well. Thanks again. – trisweb Feb 26 '13 at 15:27
4

Here is a Java implementation of the solution proposed by Dave:

public static List<Range> getMaxIntersectingSubset(List<Range> ranges) {
    // First, we collect "a sorted list of the starting and ending points".
    // We're sorting such that all lower bounds come first
    List<Bound> intersections = Arrays.stream(ranges)
        .flatMap(arr -> Stream.of(Bound.ofLowerBound(arr[0]), Bound.ofUpperBound(arr[1])))
        .sorted(Comparator
            .comparing(Bound::value)
            .thenComparing((a, b) -> Boolean.compare(!a.isLowerBound(), b.isLowerBound())))
        )
        .collect(Collectors.toList());

    long curOverlap = 0;
    long maxOverlap = 0;
    List<Integer> indexes = new ArrayList<>();

    // Then we iterate the list, searching for the highest overlap
    for (int i = 0; i < intersections.size(); i++) {
        Bound intersection = intersections.get(i);
        curOverlap += (intersection.isLowerBound() ? 1 : -1);
        if (curOverlap > maxOverlap) {
            maxOverlap = curOverlap;
            indexes.clear();
            indexes.add(i);
        }
        else if (curOverlap == maxOverlap) {
            indexes.add(i);
        }
    }

    return indexes.stream()
        .map(index -> Range.of(intersections.get(index).value(), intersections.get(index + 1).value()))
        .collect(Collectors.toList());
}
public record Bound(int value, boolean isLowerBound) {

    public static Bound ofLowerBound(int value) {
        return new Bound(value, true);
    }

    public static Bound ofUpperBound(int value) {
        return new Bound(value, false);
    }
}
public record Range(int start, int end) {

    private Range(int start, int end) {
        if (start > end) {
            throw new IllegalArgumentException("start > end");
        }
        this.start = start;
        this.end = end;
    }

    public static Range of(int start, int end) {
        return new Range(start, end);
    }
}

These are some test sets:

  1. From this Stackoverflow question

    int[][] ints = {
        { 1, 100 },
        { 30, 95 },
        { 10, 60 },
        { 15, 25 },
        { 33, 66 },
        { 20, 50 },
        { 51, 100 },
        { 25, 70 }
    };
    

    According to OP of abovementioned question, the result should be [ 30, 55 ]. The provided code outputs this result. However, there's another overlap, although less wide. See pictures below.

    Test case #1a

    Test case #1b

  2. OP of this question mentions this list:

    int[][] ints = {
        { 12, 25 },
        { 14, 27 },
        { 15, 22 },
        { 17, 21 },
        { 20, 65 },
        { 62, 70 },
        { 64, 80 }
    };
    

    Here, the output is [ 20, 21 ]. This matches the below diagram:

    Diagram of test case #2

MC Emperor
  • 22,334
  • 15
  • 80
  • 130