Algorithm to find shortest continuous subarray that contains all values from a set

Question

I have the following problem to solve:

Given a set of integers, e.g. {1,3,2}, and an array of random integers, e.g.

[1, 2, 2, -5, -4, 0, 1, 1, 2, 2, 0, 3,3]

Find the shortest continuous subarray that contains all of the values from the set. If the subarray can not be found, return an empty array.

Result: [1, 2, 2, 0, 3]

Or

[1, 2, 2, -5, -4, 3, 1, 1, 2, 0], {1,3,2}.

Result: [3, 1, 1, 2]

I have tried the following put there seems to be something wrong with my second loop. I'm not sure what I need to change:

def find_sub(l, s):
    i = 0
    counts = dict()
    end = 0
    while i < len(s):
        curr = l[end]
        if curr in s:
            if curr in counts:
                counts[curr] = counts[curr] + 1
            else:
                counts[curr] = 1
                i += 1
        end += 1
    curr_len = end

    start = 0
    for curr in l:
        if curr in counts:
            if counts[curr] == 1:
                if end < len(l):
                    next_item = l[end]
                    if next_item in counts:
                        counts[next_item] += 1
                    end += 1
            else:
                counts[curr] -= 1
                start += 1
        else:
            start += 1
    if (end - start) < curr_len:
        return l[start:end]
    else:
        return l[:curr_len]

What is the question? Find algorithm that does it? Write a program? Find a problem in your code? [How you should ask a question](https://stackoverflow.com/help/how-to-ask) — vishes_shell, May 06 '19 at 07:21

MBo · Accepted Answer · 2019-05-06T08:11:57.833

You are using two-pointer approach, but move both indexes only once - until the first match found. You should repeat move right - move left pattern to get the best index interval.

def find_sub(l, s):
    left = 0
    right = 0
    ac = 0
    lens = len(s)
    map = dict(zip(s, [0]*lens))
    minlen = 100000
    while left < len(l):
        while right < len(l):
            curr = l[right]
            right += 1
            if curr in s:
                c = map[curr]
                map[curr] = c + 1
                if c==0:
                    ac+=1
                    if ac == lens:
                        break
        if ac < lens:
            break

        while left < right:
            curr = l[left]
            left += 1
            if curr in s:
                c = map[curr]
                map[curr] = c - 1
                if c==1:
                    ac-=1
                    break

        if right - left + 1 < minlen:
            minlen = right - left + 1
            bestleft = left - 1
            bestright = right

    return l[bestleft:bestright]

print(find_sub([1, 2, 2, -5, -4, 3, 1, 0, 1, 2, 2, 0, 3, 3], {1,3,2}))
print(find_sub([1, 2, 2, -5, -4, 3, 1, 0, 1, 2, 2, 1, 0, 3, 3], {1,3,2}))
>>[2, -5, -4, 3, 1]
>>[2, 1, 0, 3]

Netwave · Answer 2 · 2019-05-06T07:30:00.710

You can use a sliding window approach (using a generator), the idea is to generate all subsets of size n (size of the set) to size N (size of the list), and check if any of them exists, stopping when finding the first one:

from itertools import islice, chain

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result


l = [1, 2, 2, -5, -4, 3, 1, 1, 2, 0]
s = {1,3,2}

def minimum_subset(l, s):
    for w in chain.from_iterable(window(l, i) for i in range(len(s), len(l)+1)):
        if s == set(w):
            return w
    return []

print(minimum_subset(l, s))

Result (3, 1, 1, 2) Here you have the live example

This will be quite slow for long lists, the two-pointer approach is faster — fafl, May 06 '19 at 07:36

fafl · Answer 3 · 2019-05-08T19:43:20.153

This should be the most performant solution, running in O(n):

def find_sub(l, s):
    if len(l) < len(s):
        return None

    # Keep track of how many elements are in the interval
    counters = {e: 0 for e in s}

    # Current and best interval
    lo = hi = 0
    best_lo = 0
    best_hi = len(l)

    # Increment hi until all elements are in the interval
    missing_elements = set(s)
    while hi < len(l) and missing_elements:
        e = l[hi]
        if e in counters:
            counters[e] += 1
        if e in missing_elements:
            missing_elements.remove(e)
        hi += 1

    if missing_elements:
        # Array does not contain all needed elements
        return None

    # Move the two pointers
    missing_element = None
    while hi < len(l):
        if missing_element is None:
            # We have all the elements
            if hi - lo < best_hi - best_lo:
                best_lo = lo
                best_hi = hi

            # Increment lo
            e = l[lo]
            if e in counters:
                counters[e] -= 1
                if counters[e] == 0:
                    missing_element = e
            lo += 1
        else:
            # We need more elements, increment hi
            e = l[hi]
            if e in counters:
                counters[e] += 1
                if missing_element == e:
                    missing_element = None
            hi += 1

    return l[best_lo:best_hi]


assert find_sub([1, 2, 2, -5, -4, 3, 1, 0, 1, 2, 2, 0, 3, 3], {1, 3, 2}) == [2, -5, -4, 3, 1]
assert find_sub([1, 2, 2, -5, -4, 3, 1, 0, 1, 2, 2, 1, 0, 3, 3], {1, 3, 2}) == [2, 1, 0, 3]
assert find_sub([1, 2, 2, -5, -4, 3, 1, 0, 1, 2, 2, 1, 0, 3, 3], {1, 3, 7}) is None

If `any()` works how I would expect, this takes O(n^2) time. You could fix this by keeping a separate variable that tracks the number of zeros in `counters`, and updating it if necessary every time you increment or decrement an element of `counters`. — j_random_hacker, May 07 '19 at 08:24
I didn't want to optimize too much, but your comment inspired me to do better. The new code should run in O(n). — fafl, May 07 '19 at 09:31
Looks good :) One more nitpick and then I'll +1: If even the entire array doesn't contain all elements, your code will quietly return the entire array anyway. — j_random_hacker, May 07 '19 at 09:40

score 0 · Answer 4 · answered May 06 '19 at 09:22

Joining in the fun, here's my attempt. I'm not familiar with algorithm names, but this would seem like a sliding window approach based on @Netwave's description for his answer.

I = {1, 3, 2}
A = [1, 2, 2, -5, -4, 0, 1, 1, 2, 2, 0, 3, 3]

setcount = {i: 0 for i in I}
stage = []
shortest = A

for i in range(len(A)):
    # Subset
    stage.append(A[i])
    # Update the count
    if A[i] in I:
        setcount[A[i]] += 1

    while 0 not in setcount.values():
        # Check if new subset is shorter than existing's
        if len(stage) < len(shortest):
            shortest = stage.copy()

        # Consume the head to get progressively shorter subsets
        if stage[0] in I:
            setcount[stage[0]] -= 1

        stage.pop(0)


>>>print(shortest)
[1, 2, 2, 0, 3]

Algorithm to find shortest continuous subarray that contains all values from a set

4 Answers4

Linked