2

I've been puzzling over this for a while now and finally decided to see if anyone can think of an efficient implementation (I'm not sure if there is one).

Given a series of intervals (for example, the following):

Image

What is the largest subset of intervals that overlaps multiple times? In this case, the answer would be [B, C, D], since [A, B, C, D] overlaps only once, and the other repeatedly overlapping subintervals are smaller than 3 elements.

Edit 9/28: Here's a more complex example:

Complex Image
(source: i.ibb.co)

In this case the answer is [B, D, E, F, H], as there are multiple points traversing this table from left to right in which this subset is overlapping.

Corner Case

Note also in this case, the answer would be [A, B, C].

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Makiah
  • 73
  • 6
  • what are the variables of the problem and are they free variables or they have a cap? e.g: types of intervals (fixed < 10), amount of intervals (free). – Eloy Pérez Torres Sep 27 '20 at 23:21
  • There are an arbitrary number of intervals over an arbitrary number of entities (A, B, ...). – Makiah Sep 28 '20 at 00:20
  • This is related: https://stackoverflow.com/questions/15013800/find-the-maximally-intersecting-subset-of-ranges/15059229#15059229. We can modify it to answer this. – Dave Sep 28 '20 at 01:54
  • What are the labels A,B,C,D? Can several intervals share the same label? In your example there are two intervals marked A, two marked B, and so on. Do intervals with the same label ever intersect? – ciamej Sep 28 '20 at 16:26
  • Just added a more complex example to demonstrate another way intervals could be configured. Intervals with the same label cannot intersect. {A, B, ...} can be considered boolean values over some timeframe (traversing the table from left to right). Intervals can't intersect. – Makiah Sep 28 '20 at 18:12
  • @ciamej You are right; I misinterpreted the question. – Dave Sep 28 '20 at 19:11

2 Answers2

1

Below is my solution implemented in Java. It's similarly to Dave's approach a sweep line algorithm.

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

public class IntervalSolver {

     public static void main(String []args){
        ArrayList<Interval> intervals = new ArrayList<>();
        
        final int A = 0;
        final int B = 1;
        final int C = 2;
        final int D = 3;
        
//        intervals.add(new Interval(1, 10, A));
//        intervals.add(new Interval(9, 12, B));
//        intervals.add(new Interval(11, 15, C));
//        intervals.add(new Interval(11, 18, D));
//        intervals.add(new Interval(26, 29, A));
//        intervals.add(new Interval(20, 30, B));
//        intervals.add(new Interval(28, 33, C));
//        intervals.add(new Interval(27, 33, D));
        
//        intervals.add(new Interval(1, 10, A));
//        intervals.add(new Interval(1, 10, B));
//        intervals.add(new Interval(1, 10, C));
//        intervals.add(new Interval(1, 10, D));
//        intervals.add(new Interval(16, 25, A));
//        intervals.add(new Interval(16, 25, B));
//        intervals.add(new Interval(16, 25, C));
        
        intervals.add(new Interval(1, 7, A));
        intervals.add(new Interval(1, 10, B));
        intervals.add(new Interval(1, 10, C));
        intervals.add(new Interval(1, 10, D));
        
        compute(intervals, 4);
     }
     
     public static class Interval {
         public int start;
         public int end;
         public int entity;
         
         public Interval(int start, int end, int entity) {
             this.start = start;
             this.end = end;
             this.entity = entity;
         }
     }
     
     public static class Point implements Comparable<Point> {
         boolean start;
         public Interval interval;
         
         public int pos() {
             return start ? interval.start : interval.end;
         }
         
         public Point(Interval interval, boolean start) {
             this.interval = interval;
             this.start = start;
         }
         
         @Override
         public int compareTo(Point other) {
             int diff = pos() - other.pos();
             if (diff != 0)
                return diff;
             return (start ? 0 : 1) - (other.start ? 0 : 1);
         }
     }
     
     public static class Solution {
         public Interval[] intervals;
         
         public Solution(Interval[] intervals) {
             this.intervals = intervals;
         }
         
         public int value() {
             int value = 0;
             for (int i = 0; i < intervals.length; i++)
                 value += intervals[i] != null ? 1 : 0;
             return value;
         }
         
         @Override
         public boolean equals(Object other) {
             Solution s = (Solution)other;
             if (s.intervals.length != intervals.length)
                 return false;
             
             for (int i = 0; i < intervals.length; i++)
                 if ((intervals[i] == null && s.intervals[i] != null) || (intervals[i] != null && s.intervals[i] == null))
                     return false;
             return true;
         }
         
         @Override
         public int hashCode() {
             int hashCode = 0;
             for (int i = 0; i < intervals.length; i++)
                 hashCode = hashCode << 1 | (intervals[i] != null ? 1 : 0); 
             return hashCode;
         }
         
         @Override
         public String toString() {
             var sb = new StringBuilder();
             sb.append('[');
             for (int i = 0; i < intervals.length; i++) {
                 sb.append(intervals[i] != null ? '1' : '0');
                 if (i < intervals.length - 1)
                     sb.append(',');
             }
             sb.append(']');
             return sb.toString();
         }
     }
     
     public static void compute(List<Interval> series, int entities) {
         ArrayList<Point> points = new ArrayList<>();
         for (var i : series) {
             points.add(new Point(i, true));
             points.add(new Point(i, false));
         }
         points.sort(null);
         
         HashMap<Solution, Integer> solutions = new HashMap<>();
         Interval[] currentIntervals = new Interval[entities];
         
         for (int i = 0; i < points.size(); i++) {
             var p = points.get(i);
             
             if (p.start)
                 currentIntervals[p.interval.entity] = p.interval;
             else
                 currentIntervals[p.interval.entity] = null;
             
             if (i < points.size() - 1 && p.pos() == points.get(i + 1).pos())
                 continue;
             
             var copy = new Solution(currentIntervals.clone());
             int count = solutions.getOrDefault(copy, 0);
             solutions.put(copy, count + 1);
         }
         
         long maxValue = 0;
         Solution best = null;
         for (var entry : solutions.entrySet()) {
             var solution = entry.getKey();
             var count = entry.getValue();
             
             if (count == 1)
                 continue;
             
             long value = solution.value();
             if (value > maxValue) {
                 maxValue = value;
                 best = solution;
             }
         }
         
         // extra intersections:
         for (var entry : solutions.entrySet()) {
             var solution = entry.getKey();
             var count = entry.getValue();
             
             if (count > 1)
                 continue;
             
             long value = solution.value();
             if (value <= maxValue)
                 continue;
             
             for (var innerEntry : solutions.entrySet()) {
                 var innerSolution = innerEntry.getKey();
                 var innerCount = innerEntry.getValue();
                 
                 if (solution == innerSolution || innerCount > 1)
                     continue;
                 
                 long innerValue = innerSolution.value();
                 if (innerValue <= maxValue)
                     continue;
                 
                 var merge = new Solution(solution.intervals.clone());
                 for (int i = 0; i < entities; i++) {
                     var int1 = solution.intervals[i];
                     var int2 = innerSolution.intervals[i];
                     if (int2 == null || int1 == int2)
                         merge.intervals[i] = null;
                 }
                 
                 long mergeValue = merge.value();
                 if (mergeValue > maxValue) {
                     maxValue = mergeValue;
                     best = merge;
                 }
             }
         }         
         
         System.out.println("Best: " + best);
     }
}

It prints Best: [false, true, true, true] which means BCD.

It could be easily extended to print the exact coordinates where the intersection occurs.

It's complexity is O(n*lg n + nk) where n is the number of intervals and k is the number of entities. If k = O(n), then the whole complexity becomes O(n^2).

Edit: Changed the behaviour to account for the update in the requirements from the OP. The idea is based on taking each pair of potential solutions which occur only once and intersecting them. Their intersection by definition occurs at least twice. If it's value is larger than the current best it becomes the new result. The complexity has risen to O(n*lg n + n^2*k) where n is the number of intervals and k is the number of entities. If k = O(n), then the whole complexity becomes O(n^3).

Edit2: Fixed the code for extra intersections, by adding the requirement that the intervals for the same entity must be distinct.

ciamej
  • 6,918
  • 2
  • 29
  • 39
  • 1
    Just added a third case to the question: I think this might be on the right track but I don't believe this would work in cases where each of the intervals ends at the same time period. In the third case, wouldn't the map be {[A, B, C, D]: 1, [B, C, D]: 1}, at which point this would return an empty set? – Makiah Sep 28 '20 at 19:03
  • Yeah, I thought that was the expected behaviour ;-) I will think about it... – ciamej Sep 28 '20 at 19:12
  • @Makiah Take a look now. – ciamej Sep 28 '20 at 19:56
  • This seems promising, but if we use intervals A: [1, 7], B: [1, 10], C: [1, 10], D: [1, 10], it counts [true, true, true, true] and [false, true, true, true] as two separate intersections, and therefore returns [false, true, true, true] even though B, C, D occurs in only one contiguous interval. I think this might be really close though: going to see if I can adjust this a bit to make it work for those cases. – Makiah Sep 29 '20 at 13:58
  • @Makiah I guess it would be necessary to hold not only true/false values for the intersections, but also references to actual Intervals. Then we'd have to augment the merge procedure `intersection.get(i) && innerIntersection.get(i)` with the condition that the shared interval cannot be exactly the same. Do you think that would satisfy your requirements? – ciamej Sep 29 '20 at 20:04
  • The problem as defined now is quite complicated and sometimes it's difficult for me to turn my head around it ;-) – ciamej Sep 29 '20 at 20:07
  • @Makiah take a look now. – ciamej Sep 30 '20 at 14:10
  • @Makiah If it solves your problem, you can accept the answer :) – ciamej Sep 30 '20 at 15:31
  • It was fun, thanks for bringing an interesting problem :) – ciamej Sep 30 '20 at 15:33
0

In O(n log n) per set of n intervals, you can get all of the intersecting subsets (see Find the maximally intersecting subset of ranges and add the needed bookkeeping). Note that there are about 2n of these where n is the number of points in the set (fewer if intervals share starting or ending points).

We can limit our consideration to the n sets of overlapping points per set of intervals formed by adding an interval. That gets us from 2n down to n. We can also limit our consideration to sets immediately prior to an element being removed. This may further reduce the number of sets but isn't guaranteed to do so.

Now we need to find the pair of sets with max intersection.

Naively this is O(n^2).

Example: Applying this to the OP's example, we end paragraph 2 of this answer with the following sets of intervals (using the reductions in the second paragraph):

I: {{A,B}, {B,C,D}}
II: {{B,A,C,D}}

Then, we find the max intersection of a set from I with a set from II: {B,C,D}

Dave
  • 7,460
  • 3
  • 26
  • 39
  • How about the requirement that "subset of intervals that overlaps multiple times"? – ciamej Sep 28 '20 at 02:39
  • @ciamej This may be confusing because there are multiple sets. First (using the linked solution), we find all sets of intersecting intervals per set of input intervals in O(n log n). I'm assuming 2 sets of input intervals, so now we have 2 sets of intersections, where each intersection is a set of intersecting interval labels. Finally, we want the max intersection of an element from each of these sets. I added an example to clarify. – Dave Sep 28 '20 at 02:44
  • How do you keep these sets of sets? `{{A,B}, {B,C,D}}` may consume `n^2` space, as the number of intervals is not bounded. Then intersecting them will be `n^4`, won't it? – ciamej Sep 28 '20 at 02:52
  • Why not simply process each of the sorted 2n points by putting into a hashmap a list of all the currently overlapping intervals, and then dropping the lists that occurred only once, and finally selecting the longest list? – ciamej Sep 28 '20 at 03:00
  • @ciamej The number of sets of overlapping intervals is bounded to 2n, or n if you limit to those formed by adding an interval as I suggest. (That's per input set). – Dave Sep 28 '20 at 03:16
  • @ciamej That works for the max overlap of a single set of intervals, but not for the max subset that overlaps twice. – Dave Sep 28 '20 at 03:18
  • Yes, I agree, the number of sets is bounded to 2n, but each such set can contain O(n) intervals. – ciamej Sep 28 '20 at 03:30
  • I think you misunderstood what I meant. I proposed to put into a hash map the following sets of intervals `{A}, {A,B}, {B}, {B,C,D}, {B}, {A,B}, {A,B,D}, {A,B,C,D}, {B,C,D}, {C,D}`. The hashmap has a set of intervals as the key, e.g. `{B,C,D}` and number of occurences as the value. In the end, we choose the longest key in the hashmap whose value is greater than one. – ciamej Sep 28 '20 at 03:37
  • To clarify what I meant two comments above: `{A,B}` is size 2, `{B,C,D}` is size 3, but in general such a set can be `{A, B, C, D, E, F, G, ...}` and we're not bound by the letters of the alphabet ;-) such a set of intervals can have `O(n)` size. – ciamej Sep 28 '20 at 03:44
  • @ciamej Say the first set of intervals have varying starts but a common end, so we get: {A}, {A,B}, {A,B,C}, {A,B,C,D}. For the second set, A is isolated, and B, C, D have varying starts but a common end, so we get: {A}, {B}, {B,C}, {B,C,D}. Here only {A} will have a count of 2, but {B,C,D} is the right answer. – Dave Sep 28 '20 at 16:08
  • I think we might have a different understanding of the question. By two sets of interval do you mean the left and right parts of the figure in the OP? I believe there is only one set of input intervals, and the fact there is a gap in the figure is only a coincidence. I think the input consists of multiple streams of intervals; each stream with its own label A,B,C or D. But there number of such streams is unbounded. Do you understand the question as me? Maybe we should ask the OP how it should be understood? – ciamej Sep 28 '20 at 16:23
  • @ciamej is correct in this case, the fact that there is a gap above is more of a coincidence in how I drew the example problem. That said I think Dave is also on the correct path in this case, but unfortunately the same solution I initially came up with: create a map of all possible subset combinations we see and increment them as they occur (and getting all combinations is O(n!) at worst case, no?). Perhaps that's the only way to solve this problem. – Makiah Sep 28 '20 at 17:46
  • @Makiah I have a different solution in mind and I will soon post it. – ciamej Sep 28 '20 at 17:57