How to get rid of sub-tuples in this list?

Question

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]

Since (0,2) & (4,6) are both within the indexes of (0,6), so I want to remove them. The resulting list would be:

list_of_tuple = [(0,6), (6,7), (8,9)]

It seems I need to sort this tuple of list somehow to make it easier to remove. But How to sort a list of tuples?

Given two tuples of array indexes, [m,n] and [a,b], if:

m >=a & n<=b

Then [m,n] is included in [a,b], then remove [m,n] from the list.

What's the exact criteria you're using to remove them? It seems like you're treating them as intervals, right? So if any interval is contained within another, it should be removed? You could do that with `range` in O(n^2) time pretty easily. — wjandrea, Jun 05 '20 at 18:16
Does this answer your question? [Collapse a list of range tuples into the overlapping ranges](https://stackoverflow.com/questions/10790415/collapse-a-list-of-range-tuples-into-the-overlapping-ranges) — Georgy, Jun 05 '20 at 18:26

Red · Answer 1 · 2020-06-07T03:22:27.717

2

To remove all tuples from list_of_tuples with a range out of the specified tuple:

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]

def rm(lst,tup):
    return [tup]+[t for t in lst if t[0] < tup[0] or t[1] > tup[1]]

print(rm(list_of_tuple,(0,6)))

Output:

[(0, 6), (6, 7), (8, 9)]

edited Jun 07 '20 at 03:22

answered Jun 05 '20 at 19:43

Red

26,798
7
36
58

2

I see a problem with both of these solutions. The first doesn't compare *anything*, it simply tosses tuples on either side of an arbitrarily chosen tuple in the list. The second does the same using a comprehension. Both ignore the OP's criteria formula for when a tuple is removed from the list. – cdlane Jun 06 '20 at 07:29
1

Ann, the OP states that a tuple `(m,n)` is removed if there exists another tuple `(a,b)` such that `m >= a and n <= b`. Some interpreted this as applying to a randomized list, but since the OP's list is already sorted in a fashion, I chose to interpret that as a given -- an unconfirmed assumption. – cdlane Jun 06 '20 at 15:42
Nope, you're only looking at one element of the tuple but both have to be considered. Also, yours is the only answer that takes an element of the list as an argument -- perhaps you're picking up on something the rest of us are missing. – cdlane Jun 07 '20 at 03:20

wjandrea · Answer 2 · 2020-06-05T18:54:28.353

1

Here's a dead-simple solution, but it's O(n²):

intervals = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)]  # list_of_tuple
result = [
    t for t in intervals
    if not any(t != u and t[0] >= u[0] and t[1] <= u[1] for u in intervals)
    ]

It filters out intervals that are not equal to, but contained in, any other intervals.

edited Jun 05 '20 at 18:54

answered Jun 05 '20 at 18:49

wjandrea

28,235
9
60
81

It's actually better than claim. It will remove duplicates as well, which is the case if the tuples are equal. @wjandrea – AirSquid Jun 05 '20 at 18:54
@JeffH It doesn't remove duplicates. For example if you add another `(0, 6)`, it will be included in the result. If you wanted to remove duplicates, just convert `intervals` to a set first. – wjandrea Jun 05 '20 at 18:56
1

@awarrier99 No, you can't rely on identity for immutable types since they can be interned. For example I just ran `(0, 6) is (0, 6)` on CPython 3.7 and got `True`. – wjandrea Jun 05 '20 at 19:04
Yep. You are correct. My test case was cross-pollinated with the other solutions. I was wondering why it *appeared* to be pulling duplicates. :) – AirSquid Jun 05 '20 at 19:05
@wjandrea you're right, that's why I removed my comment. I tested with `x = (0, 6)` and `y = (0, 6)`, which resulted in `x is y` printing `False`, but I spoke too soon because when I tested it out, it didn't actually make a difference – awarrier99 Jun 05 '20 at 19:05
@wjandrea What exactly is the difference here and why does it return `True` when directly comparing as opposed to storing the values in variables? – awarrier99 Jun 05 '20 at 19:07
1

@awarrier99 I saw a great answer about that at one point but can't find it now. If I remember correctly it's that, on the REPL, single lines are compiled together, and immutable objects are stored as the same object, but separate lines are not. – wjandrea Jun 05 '20 at 19:48
@wjandrea thanks, I'm not sure if I found the answer you're referring to but for the most part I see that it can change depending on the implementation whether it decides to intern or not – awarrier99 Jun 05 '20 at 20:15
There is a bug. Some tests don't pass – marlon Jun 06 '20 at 07:42
@marlon What tests? Your question has only one example input. – wjandrea Jun 06 '20 at 16:13

score 1 · Answer 3 · answered Jun 06 '20 at 07:12

Seems like an opportunity to abuse both reduce() and Python's logical operators! Solution assumes list is sorted as in the OP's example, primarily on the second element of each tuple, and secondarily on the first:

from functools import reduce

list_of_sorted_tuples = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)]

def contains(a, b):
    return a[0] >= b[0] and a[1] <= b[1] and [b] or b[0] >= a[0] and b[1] <= a[1] and [a] or [a, b]

reduced_list = reduce(lambda x, y: x[:-1] + contains(x[-1], y) if x else [y], list_of_sorted_tuples, [])

print(reduced_list)

OUTPUT

> python3 test.py
[(0, 6), (6, 7), (8, 9)]
>

awarrier99 · Accepted Answer · 2020-06-05T18:48:03.160

You could try something like this to check if both ends of the (half-open) interval are contained within another interval:

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
reduced_list = []
for t in list_of_tuple:
    add = True
    for o in list_of_tuple:
        if t is not o:
            r = range(*o)
            if t[0] in r and (t[1] - 1) in r:
                add = False

    if add:
        reduced_list.append(t)

print(reduced_list) # [(0, 6), (6, 7), (8, 9)]

Note: This assumes that your tuples are half-open intervals, i.e. [0, 6) where 0 is inclusive but 6 is exclusive, similar to how range would treat the start and stop parameters. A couple of small changes would have to be made for the case of fully closed intervals:

range(*o) -> range(o[0], o[1] + 1)

and

if t[0] in r and (t[1] - 1) in r: -> if t[0] in r and t[1] in r:

score 0 · Answer 5 · answered Jun 05 '20 at 19:27

Here is the first step towards a solution that can be done in O(n log(n)):

def non_cont(lot):
    s = sorted(lot, key = lambda t: (t[0], -t[1]))
    i = 1
    while i < len(s):
        if s[i][0] >= s[i - 1][0] and s[i][1] <= s[i - 1][1]:
            del s[i]
        else:
            i += 1
    return s

The idea is that after sorting using the special key function, the each element that is contained in some other element, will be located directly after an element that contains it. Then, we sweep the list, removing elements that are contained by the element that precedes them. Now, the sweep and delete loop is, itself, of complexity O(n^2). The above solution is for clarity, more than anything else. We can move to the next implementation:

def non_cont_on(lot):
    s = sorted(lot, key = lambda t: (t[0], -t[1]))
    i = 1
    result = s[:1]
    for i in s:
        if not (i[0] >= result[-1][0] and i[1] <= result[-1][1]):
            result.append(i)
    return result

There is no quadratic sweep and delete loop here, only a nice, linear process of constructing the result. Space complexity is O(n). It is possible to perform this algorithm without extra, non-constant, space, but I will leave this out.

A side effect of both algorithm is that the intervals are sorted.

score 0 · Answer 6 · answered Jun 05 '20 at 22:43

If you want to preserve the information about the inclusion-structure (by which enclosing interval an interval of the original set is consumed) you can build a "one-level tree":

def contained(tpl1, tpl2):
    return tpl1[0] >= tpl2[0] and tpl1[1] <= tpl2[1] 

def interval_hierarchy(lst):
    if not lst:
        return
    root = lst.pop()
    children_dict = {root: []}
    while lst:
        t = lst.pop()
        curr_children = list(children_dict.keys())
        for k in curr_children:
            if contained(k, t):
                children_dict[t] = (children_dict[t] if t in children_dict else []) +\
                                   [k, *children_dict[k]]
                children_dict.pop(k)
            elif contained(t, k):
                children_dict[k].append(t)
                if t in children_dict:
                    children_dict[k] += children_dict[t]
                    children_dict.pop(t)
            else:
                if not t in children_dict:
                    children_dict[t] = []
    # return whatever information you might want to use
    return children_dict, list(children_dict.keys())

score 0 · Answer 7 · answered Jun 05 '20 at 22:53

It appears you are trying to merge intervals which are overlapping. For example, (9,11), (10,12) are merged in the second example below to produce (9,12).

In that case, a simple sort using sorted will automatically handle tuples.

Approach: Store the next interval to be added. Keep extending the end of the interval until you encounter a value whose "start" comes after (>=) the "end" of the next value to add. At that point, that stored next interval can be appended to the results. Append at the end to account for processing all values.

def merge_intervals(val_input):
    if not val_input:
        return []
    vals_sorted = sorted(val_input)   # sorts by tuple values "natural ordering"
    result = []
    x0, x1 = vals_sorted[0]           # store next interval to be added as (x0, x1)
    for start, end in vals_sorted[1:]:
        if start >= x1:               # reached next separate interval
            result.append((x0, x1))
            x0, x1 = (start, end)
        elif end > x1:
            x1 = end                  # extend length of next interval to be added
    result.append((x0, x1))
    return result

print(merge_intervals([(0,2), (0,6), (4,6), (6,7), (8,9)]))
print(merge_intervals([(1,2), (9,11), (10,12), (1,7)]))

Output:

[(0, 6), (6, 7), (8, 9)]
[(1, 7), (9, 12)]

The merge of `(9,11)` and `(10,12)` doesn't seem to match the OP's criteria of `a[0] >= b[0] and a[1] <= b[1]` for reducing tuples. — cdlane, Jun 06 '20 at 07:20

How to get rid of sub-tuples in this list?

7 Answers7