2

I have an application where I have a series of non overlapping fixed width intervals, each of which have a given key. Each interval has the same width, and there may be contiguous intervals. Essentially I want to group the intervals and keys in such a way that I minimize the amount of separate intervals. This could be done by merging contiguous intervals with the same key or looking for matching intervals and combining them into single intervals with multiple keys. My current algorithm tries the above two approaches and sees which results in the smallest number of intervals, but I feel like there must be a smarter way of approaching the problem. Any advice would be much appreciated!

For example:

|-----| |-----| intervals with key k1 (contiguous)

|-----| interval with key k2

|-----| interval with key k3

In this problem the intervals with key k1 could be merged to a single contiguous interval, resulting in 3 instead of four total intervals. Or the first interval for each key k1, k2 and k3 could be combined to one interval with keys k1, k2 and k3, resulting in one interval plus the second remaining interval in k1.

Worst case would be something like 70,000 intervals.

iratemike
  • 37
  • 5

1 Answers1

0

An idea for good approximation solution. Since intervals of fixed width input data can be represent as bitmask with intervals on one (X) axis and keys on other (Y) axis. For you data like:

k3  1  0
k2  1  0
k1  1  1
   i1 i2

Problem is similar to partitioning rectilinear polygon problem. There is a very good answer that covers the topic.

This is not exact problem since in this case order of keys is not important, what can be seen in example:

k3  1  1
k2  1  0
k1  1  1
   i1 i2

Partitioning will produce find result with 3 rectangles, and solution should be 2.

Simple solution to this (also approximation) is to make output post-processing and join rectangles on same intervals. Here will help pre-processing that reorders keys so that keys with similar 'covering' are nearer or neighbours.

More complex solution is (for which I am not 100% it works) is to use idea from algorithm for partitioning rectilinear polygons. Idea is to:

  • take all possible splits (cords),
  • create graph with cords as vertices and edges where cords intersect,
  • find maximal (cord) independent set.

In original case, each inner (270deg) corner produces two cords in two directions. Since key ordering is not important, I think that Z cords should go through whole 'geometry'. That means cords are:

  • one key continues intervals (X direction)
  • interval ends (Z direction, goes through whole geometry).
Community
  • 1
  • 1
Ante
  • 5,350
  • 6
  • 23
  • 46
  • Excellent answer, thank you very much Ante. Lots of interesting reading material on this rectilinear polygon problem! – iratemike Nov 22 '16 at 08:50