3

I have several arrays of variable lengths filled with tuples that represent good and bad blocks of data.

input = [
    [(True, 0, 400), (False, 400, 500), (True, 500, 1000)],
    [(True, 0, 200), (False, 200, 400), (True, 400, 1000)],
    [(False, 0, 100), (True, 100, 1000])]]

What I want to do is create a new list of tuples that represent blocks of data that are good across all of my arrays. The above would end up as follows.

output = [(False, 0, 100), (True, 100, 200), (False, 200, 500), 
         (True, 500, 1000)]

The tuples in each original array are guaranteed to be alternating True and False. Each array will also have the same start and end (in the case above would be 0 and 1000).

test = fn(input)
assert test == output

My goal is to make this work in O(n) time but I haven't been able to figure it out.

enter image description here

trblair82
  • 33
  • 5
  • Hi! It's a bit unclear what you're doing. I don't understand by what logic you go from your input to your output. – Stef Feb 16 '22 at 17:23
  • 1
    I added an image to visually represent what the goal is. If there is a True value in the output that means that for that range all of the input values are True. – trblair82 Feb 16 '22 at 18:53
  • 1
    This is strongly related to [Algorithm to total the combined length of segments](https://stackoverflow.com/questions/70970922/algorithm-to-total-the-combined-length-of-segments) and [Union of multiple ranges](https://stackoverflow.com/a/34279227/3080723). – Stef Feb 18 '22 at 12:08
  • In python using sympy's interval-merging algorithms: `from sympy import Interval, Union; trues = Union(*(Interval(a,b) for l in input for t,a,b in l if t)); falses = Union(*(Interval(a,b) for l in data for t,a,b in l if not t)); all_true = trues - falses` And now `falses` contains all intervals that have at least one false, and `all_true` contains all intervals that are completely true. With your data, the result is `all_true = Union(Interval.open(100, 200), Interval.Lopen(500, 1000))` and `falses = Union(Interval(0, 100), Interval(200, 500))`. – Stef Feb 18 '22 at 12:10
  • 1
    @Stef the sympy implementation does not run in O(n) – trblair82 Feb 23 '22 at 00:23

2 Answers2

4

Basically, you want to merge the arrays. You can do it pairwise: firstly, merge the first array with second one, then merge the result with the third array, and so on.

How to merge, two arrays? Run a loop with two indexes, when over the first array, the second over the second one. In each loop iteration you increment one of the indexes, the one which represents the tuple with lower data position. Then you merge the arrays tuple by tuple.

As an example, let us consider you first two arrays.

[(True, 0, 400), (False, 400, 500), (True, 500, 1000)],
[(True, 0, 200), (False, 200, 400), (True, 400, 1000)],

We begin with i pointing at (True, 0, 400) and j pointing at (True, 0, 200). None is false, so our output starts with (True, 0, 200) (the common part of these two tuples). Then we increment j (because its tuple ends in 200).

In the next iteration i and j point to tuples (True, 0, 400) and (False, 200, 400), respectively. We merge them, and get (False, 200, 400), which we add to our output. Now both tuples end with the same data position, so we increment both.

Next, we get tuples (False, 400, 500) and (True, 400, 1000). We merge them to obtain (False, 400, 500). Because the last tuple in our output was False, instead of adding one more False tuple, we simply extend the last tuple in output to 500. We increment i.

In the last iteration we have, (True, 500, 1000) and (True, 400, 1000) which we merge to get (True, 500, 1000).

Thus, our output is [(True, 0, 200), (False, 200, 500), (True, 500, 1000)].

ciamej
  • 6,918
  • 2
  • 29
  • 39
0

So you have to take the possibly overlapping unions of False ranges. The True ranges could be used to find 0 - 1000. At the end add True ranges to fill the gaps.

....FFF...FFF...FFFFF.... old partial result, initially empty
......FFFFFFFFFFF.....F.. next input ranges
------------------------- union
....FFFFFFFFFFFFFFFFF.F.. new partial result

So model the input:

record Range(boolean onoff, int x1, int x2) { }

Range[][] input = {
    {new Range(true, 0, 400), new Range(false, 400, 500), new Range(true, 500, 1000)},
    {new Range(true, 0, 200), new Range(false, 200, 400), new Range(true, 400, 1000)},
    {new Range(false, 0, 100), new Range(true, 100, 1000)}
}:

List<Range> accumulatedFalse new ArrayList<>();
for (Range[] ranges: input) {
    int x0 = 0;
    for (Range range: ranges) {
        if (!range.onoff) {
            // Accumulate this range:
            ... check for any overlapping and update accumulatedFalse 
        }
    }
}
... Fill True ranges in the gaps ...

And of course I do not spoil the fun coding this.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138