-1

I have an AoC problem where I have been given the data below:

data = """2-4,6-8
    2-3,4-5
    5-7,7-9
    2-8,3-7
    6-6,4-6
    2-6,4-8"""

I need to find the number of pairs which fully contain another pair. For example, 2-8 fully contains 3-7, and 6-6 is fully contained by 4-6.

I have solved it using the below code:

 def aoc_part1(self, data):
        counter = 0
        for lines_data in data.splitlines():
            lines_data = lines_data.strip()
            first_range, second_range = self.__get_first_second_list_of_elements(lines_data)
            check_first_side_if_returns_true = all(item in first_range for item in second_range)
            check_second_side_if_returns_true = all(item in second_range for item in first_range)
            if check_first_side_if_returns_true or check_second_side_if_returns_true:
                counter += 1
        return counter
def __get_first_second_list_of_elements(self, data):
        first_elf, second_elf = data.split(",")[0], data.split(",")[1]
        first_range_start, first_range_end = map(int, first_elf.split("-"))
        second_range_start, second_range_end = map(int, second_elf.split("-"))
        first_range = list(range(first_range_start, first_range_end + 1))
        second_range = list(range(second_range_start, second_range_end + 1))
        return first_range, second_range

I was just wondering about the time complexity here. I think it should be a brute force here because for every iteration all will run another loop. How can I optimize this solution in order to get linear time complexity?

first_range and second_range are of int types. check_first_side_if_returns_true and check_second_side_if_returns_true are the boolean variables that check if the list is entirely contained or not. Based on that, it returns True or False.

RushHour
  • 494
  • 6
  • 25
  • 1
    I just comment (not answer), because I am very unsure whether I understood your question. The complexity w.r.t. the range size can of course be avoided. If you have the ranges `l1-u1, l2-u2`, you just need to check that `l1<=l2` and `u1>=u2`. – Dr. V Dec 04 '22 at 13:02
  • 1
    Please specify the types of these variables and the data types inside them: `first_range`, `second_range` – The Myth Dec 04 '22 at 13:03
  • 1
    Furthermore, there are calls to methods here that you didn't provide, so we can't tell what they do and what objects you manipulate. Always provide a [mre]. Anyway, as already stated in the first comment, you only need to check the bounds of the ranges... – Thierry Lathuille Dec 04 '22 at 13:13
  • Tweaking the code with problems like these, I don't think, will reduce O(n^3) to O(n), loops three time for check (elements >= 100). But use all the cores in the machine in parallel may come close to your stated goal. – MZM Dec 04 '22 at 13:16
  • 2
    You don't have to go over the full ranges to check the condition, you just need to compare the boundaries? – Timus Dec 04 '22 at 13:37
  • @Dr.V and @Timus. I tried the suggestion but how will the comparing work for `6-6,4-6`? The suggested approach doesn't work for this scenario. – RushHour Dec 04 '22 at 14:58
  • I tried this approach `if (first_range_start <= second_range_start and first_range_end >= second_range_end) or (second_range_start <= first_range_start and second_range_end >= second_range_start):` – RushHour Dec 04 '22 at 15:03
  • @MZM I get that you are trying to talk about scalability here. How can I scale here? Lists can't be scaled if I remember correctly. – RushHour Dec 04 '22 at 15:09
  • you don't need `all`, you just need to check start and end – njzk2 Dec 04 '22 at 15:12
  • 1
    what does `__get_first_second_list_of_elements` do? – njzk2 Dec 04 '22 at 15:14
  • in a nutshell, `(x, y) included in (a, b)` is as simple as `(x >= a and y <= b)`, assuming `y >= x` and `b >= a` – njzk2 Dec 04 '22 at 15:15
  • @njzk2 I have added the requested method. However, it's not like always that (x,y) will be in (a,b). It could be vice versa too. – RushHour Dec 04 '22 at 16:15
  • What will be the time complexity over here @njzk2? – RushHour Dec 04 '22 at 16:28
  • `list(range(` -> terrible idea: `range` has a good implementation of `in` (see https://stackoverflow.com/questions/30081275/why-is-1000000000000000-in-range1000000000000001-so-fast-in-python-3/), and creating it is constant time, but creating the list is linear to its length – njzk2 Dec 04 '22 at 20:03

2 Answers2

1

Your solution looks pretty complicated. Why not do something like:

data = """2-4,6-8
2-3,4-5
5-7,7-9
2-8,3-7
6-6,4-6
2-6,4-8
"""

def included(line):
    (a1, b1), (a2, b2) = (map(int, pair.split("-")) for pair in line.strip().split(","))
    return (a1 <= a2 and b2 <= b1) or (a2 <= a1 and b1 <= b2)

print(sum(included(line) for line in data.splitlines()))

I did some timing with my AoC-input for day 4 (1,000 lines):

from timeit import timeit

# Extract the interval boundaries for the pairs
boundaries = [
    [tuple(map(int, pair.split("-"))) for pair in line.strip().split(",")]
    for line in data.splitlines()
]

# Version 1 with simple comparison of boundaries
def test1(boundaries):
    def included(pairs):
        (a1, b1), (a2, b2) = pairs
        return (a1 <= a2 and b2 <= b1) or (a2 <= a1 and b1 <= b2)
    
    return sum(included(pairs) for pairs in boundaries)

# Version 2 with range-subset test
def test2(boundaries):
    def included(pairs):
        (a1, b1), (a2, b2) = pairs
        numbers1, numbers2 = set(range(a1, b1 + 1)), set(range(a2, b2 + 1))
        return numbers1 <= numbers2 or numbers2 <= numbers1
        
    return sum(included(pairs) for pairs in boundaries)

# Test for identical result
print(test1(boundaries) == test2(boundaries))

# Timing
for i in 1, 2:
    t = timeit(f"test{i}(boundaries)", globals=globals(), number=1_000)
    print(f"Duration version {i}: {t:.1f} seconds")

Result here, on a mediocre machine (repl.it):

Duration version 1: 0.4 seconds
Duration version 2: 5.4 seconds
Timus
  • 10,974
  • 5
  • 14
  • 28
  • 1
    Can you explain the time complexity here? Also, if the data size increases will it help the scalability? – RushHour Dec 04 '22 at 19:23
  • 2
    @RushHour Time complexity for one line is O(1), for the whole data O(n), with n the number of lines. – Timus Dec 04 '22 at 19:29
0

You're prob. making it overcomplicated in the current approach. If you split the pairs to two sets - eg. a, and b then you could easily do a set ops. to check if there is overlapping. That should be faster than yours.

Something like this one-line:

   # some input reading, and split to a, b sets.
   # count = 0

   if set(range(a, b + 1)) & set(range(x, y + 1)):
      count += 1                # that's part1 answer.

# part 2
for line in open('04.in'):
    a, b, x, y = map(int, line.replace(",", "-").split("-"))
    if set(range(a, b + 1)) & set(range(x, y + 1)):
        ans += 1

There are questions about the memory efficiency about this approach earlier, I've run some profiling and this is the result to share - it's confirmed there should be NO problem given this puzzle's input size.

Filename: day04.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    27   43.758 MiB   43.758 MiB           1   @profile
    28                                         def part2(file):
    29   43.762 MiB    0.004 MiB           1       ans = 0
    30
    31   43.770 MiB    0.000 MiB        1001       for line in open(file):
    32   43.770 MiB    0.004 MiB        1000           a, b, x, y = map(int, line.replace(",", "-").split("-"))
    33   43.770 MiB    0.000 MiB        1000           if set(range(a, b + 1)) & set(range(x, y + 1)):
    34   43.770 MiB    0.004 MiB         847               ans += 1
    35
    36   43.770 MiB    0.000 MiB           1       return ans
Daniel Hao
  • 4,922
  • 3
  • 10
  • 23
  • 2
    This certainly works but might be a bit inefficient if the ranges get very large. – Timus Dec 04 '22 at 19:01
  • Maybe. It's all depends on the requirement. It's not clear, and I've solved today's aoc puzzle. ;-) – Daniel Hao Dec 04 '22 at 19:03
  • Me too (your solution was my first attempt, then I decided against it) :) – Timus Dec 04 '22 at 19:05
  • not efficient. there's no need to create such big sets for solve the problem – njzk2 Dec 04 '22 at 20:04
  • 1
    For this puzzle - the largest ```range``` is 1-89,2-90 in the 300 pairs. So in principle, agreed with you @njzk2 - but it's not the concern *yet*... . – Daniel Hao Dec 04 '22 at 20:45
  • @DanielHao I absolutely agree that it doesn't matter for the puzzle at hand! There's still a significant difference: see my edit with timing results (hope they are right). – Timus Dec 04 '22 at 21:27
  • Well - version 2 does not seem to be my *post*? Could you update to the exact syntax, please. Thanks... – Daniel Hao Dec 04 '22 at 21:38
  • I'm a true follower of *Joe Armstrong** - "Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful!" – Daniel Hao Dec 04 '22 at 21:41
  • @DanielHao I did adjust it because the question is about part 1 of the puzzle, which is testing for _full containment_, not about _any overlap_ (part 2). – Timus Dec 04 '22 at 21:46