8

I want to check whether a list is a valid sequence of chunks, where each chunk begins with some value and ends with the next occurrence of the same value. For example, this is a valid sequence of three chunks:

lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
       \___________/  \_____/  \_______________________/

And this is one is invalid:

lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
       \___________/  \_____/  \_____ ... missing the 2 to end the chunk

I have a solution but it's bad. Do you see something better?

def is_valid(lst):
    while lst:
        start = lst.pop(0)
        if start not in lst:
            return False
        while lst[0] != start:
            lst.pop(0)
        lst.remove(start)
    return True

# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
Red
  • 26,798
  • 7
  • 36
  • 58
no comment
  • 6,381
  • 4
  • 12
  • 30

7 Answers7

10

How about this, creating an iter from the list and searching forward on that iter until the next matching element is found. Note that this might fail is None can be an element of the list; then you should rather define and compare against a sentinel obj = object().

def is_valid(lst):
    it = iter(lst)
    for x in it:
        if next((y for y in it if y == x), None) is None:
            return False
    return True

Since we don't actually need the value returned by next, we can also just use any instead, at the same time solving the problem of the default element. Like next, any will consume the iterator just as far as the matching element, if any:

def is_valid(lst):
    it = iter(lst)
    for x in it:
        if not any(y == x for y in it):
            return False
    return True

This can be further shortened using all instead of the outer for loop:

def is_valid(lst):
    it = iter(lst)
    return all(any(y == x for y in it) for x in it)

And this can finally be reduced to the equally cryptic and intriguing:

def is_valid(lst):
    it = iter(lst)
    return all(x in it for x in it)

Each way, each element is visited exactly once, the original list is not changed, little to no extra space, and IMHO it's even somewhat easy to read and understand.


This never was about speed, but anyway: Here are some benchmarks of the different solutions (and some more variations), running the test cases from the question as well as two random lists of 1,000 integers, one valid and one invalid, 10,000 times, on Python 3.8.10:

# with long lists             # only short test lists
1.52 is_valid_index           0.22 is_valid_index
3.28 is_valid_next            0.30 is_valid_next
2.78 is_valid_for_for_else    0.13 is_valid_for_for_else
5.26 is_valid_for_any         0.32 is_valid_for_any
5.29 is_valid_all_any         0.38 is_valid_all_any
3.42 is_valid_all_any_if      0.36 is_valid_all_any_if
2.02 is_valid_all_in          0.18 is_valid_all_in
1.97 is_valid_all_in_if       0.17 is_valid_all_in_if
1.87 is_valid_for_in          0.11 is_valid_for_in

Of course, all are O(n). With the long 1000-element-lists, the solution using index is fastest, but the one with x in it is not too bad, either. The any solutions lag somewhat behind, but are about as fast (or slow) as next when using a generator with condition, but still slower than when using plain for loops. With only the short test-lists, it's a bit different: Here, the solutions using one iterator and for-for-else and for-in are fastest by quite some margin.

tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • Hmm, "cryptic"? Maybe at first sight :-). But it simply finds the chunk start values and tests whether they appear again. Or maybe I'm just unusually familiar with using membership tests on iterators, I solved a few other problems with that before ([example](https://stackoverflow.com/a/69091885/16759116)). Btw I appreciate that you even used the same variable names (see the fourth test case read backwards. Maybe *that* was cryptic :-D). – no comment Sep 14 '21 at 03:01
  • @don'ttalkjustcode Of course `x in it` is exactly the same as `any(y == x for y in it)` to the point that I wonder why that heureka-moment took me so long, but somehow I still find the version with `all` and `any` somewhat clearer. Maybe the `for y in it` just makes it more explicit that it _continues_ iteration with the next elements. Still, very nice and short! – tobias_k Sep 14 '21 at 07:15
  • 1
    Well, *almost* exactly, as it [also checks identity](https://docs.python.org/3/reference/expressions.html#membership-test-operations) (but I don't consider it relevant for this question). Yes, I had been wondering as well, thinking something like "Come on, how do you not see that?" :-). Oh and looking up old stuff just now, I stumbled upon [this](https://stackoverflow.com/a/52709319/16759116)... look at what/who wim linked to :-D – no comment Sep 14 '21 at 08:33
  • Just [did it again](https://stackoverflow.com/a/69179044/16759116) :-D – no comment Sep 14 '21 at 13:56
  • Could you share the benchmark code so we can see what you measured and also run it ourselves? In trincot's benchmark for example, I think your final solution would be the fastest. – no comment Sep 15 '21 at 09:35
  • @don'ttalkjustcode Oh, did not see the update on the benchmark in trincot's comments either. Can add full code later (maybe also as a link), but busy now. – tobias_k Sep 15 '21 at 13:44
5

Here's my take on the problem. I've optimized for readability, not speed (while keeping it in O(n) of course):

def is_valid(sequence):
    iterator = iter(sequence)
    for element in iterator:
        for other in iterator:
            if element == other:
                break
        else:
            return False
    return True

Each iteration of the outer loop corresponds to a chunk. When we're out of elements here we ended the sequence at a chunk border, and we can return True. Otherwise, we loop through the iterator until we find a matching element. If we run out of elements (a for-loop that "naturally" terminates, without break, goes into its else) we return False.


And here's another one using itertools. I would not prefer it over the solution above, mostly because of the arcane use of next with a sentinel:

from itertools import dropwhile

def is_valid(iterable):
    iterator = iter(iterable)
    sentinel = object()
    for element in iterator:
        if next(dropwhile(lambda x: x != element, iterator), sentinel) is sentinel:
            return False
    return True
L3viathan
  • 26,748
  • 2
  • 58
  • 81
  • I thought about this as a variant for my first solution, too. You can use another outer `for` loop instead of `try/while/next/except`, though. – tobias_k Sep 09 '21 at 20:47
  • @tobias_k You're right, that looks nicer; editing. It's just the second solution where that wouldn't work. – L3viathan Sep 10 '21 at 17:05
  • @tobias_k I could even avoid the `try` there, if I use the alternate form of `next`... editing. – L3viathan Sep 10 '21 at 17:45
  • 1
    Hm, now that has become very close to my first variant, and you could shorten it with `all` (which is how I came to my third variant). I actually like your first one best; it's like one step "before" my 2nd and 3rd, but at the same time very readable indeed. – tobias_k Sep 10 '21 at 18:13
3

It seems like you want to make sure the last "chunk" is closed at the end of the list. This ought to do that:

def is_valid(lst):
  search = None
  paired = True
  for item in lst:
    if paired:
      search = item
      paired = False
    elif search == item:
      paired = True
  return paired

This is O(n), checks each element only one time, so you won't pay a cost for your start not in lst check that is expensive for long input lists.

eroot163pi
  • 1,791
  • 1
  • 11
  • 23
g.d.d.c
  • 46,865
  • 9
  • 101
  • 111
3

Mutating the list with pop(0) is costly and not needed.

You could use index... this may be particularly fast when the chunks are large:

def is_valid(lst):
    i = 0
    n = len(list)
    while i < n:
        try:
            i = lst.index(lst[i], i + 1) + 1
        except:
            return False
    return True
trincot
  • 317,000
  • 35
  • 244
  • 286
  • Index calls are equally costly for large input lists. You end up scanning the contents of the input list repeatedly. – g.d.d.c Sep 07 '21 at 18:31
  • yes, but they scan with compiled code, in contrast to a loop that iterates in Python code. The time complexity is the same, but the execution time will favour `index` when the chunks are relatively large. – trincot Sep 07 '21 at 18:31
  • @g.d.d.c This doesn't index from the list start, though, but from `i + 1`. – no comment Sep 07 '21 at 18:34
  • I missed "repeatedly": no, not repeatedly, @g.d.d.c – trincot Sep 07 '21 at 18:39
  • 1
    Here is a benchmark comparing this solution with gddc's, using a random list with 100000 single-digit numbers: [repl.it](https://replit.com/@trincottrincots/httpsstackoverflowcoma690929425459839) – trincot Sep 07 '21 at 18:47
  • @trincot I was just about to reuse your benchmark for a benchmark answer when I noticed you already updated it (though not with the newest). Do you intend to update and include it in your answer, or should I post a benchmark answer myself? – no comment Sep 14 '21 at 15:13
  • Go ahead an post one yourself, because I don't feel like taking it upon me to keep it in sync with every update and new answer that still may come ;-) – trincot Sep 14 '21 at 15:39
0

Below is an alternate, recursive solution to the problem. Basically just checks if the next target is in the list, and skips to that index to check again. I am not an expert here, but wanted to try and contribute a different way to solve the question.

def is_valid(
    input_list: list, 
    target_index: int = 0):

    # If we have only one element remaining, or if an empty list is passed initially, there cannot be a pair.
    if len(input_list) <= 1:
        return False
    
    target = input_list[target_index]
    search_range = input_list[target_index + 1 :]

    # print(f"target index: {target_index}")
    # print(f"target: {target}")
    # print(f"search range: {search_range}")
    # print("-------------------------------")

    if target in search_range:

        found_target_sublist_index = search_range.index(target)

        # Plus 2: two indexes start at 0 -> off by two
        next_target_index = target_index + found_target_sublist_index + 2

        if next_target_index == len(input_list):
            return True

        return is_valid(input_list, next_target_index)
    else:
        return False


test_one = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
test_two = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
test_three = ['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']
test_four = ['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']

print(is_valid(test_one)) 
print(is_valid(test_two))
print(is_valid(test_three))
print(is_valid(test_four))
  • As mentioned under another answer and indicated by my code, an empty list is a valid sequence (of zero chunks). – no comment Sep 07 '21 at 20:32
0

The question does not fully explain if we need a greedy solution or not.

Consider an example - [1, 2, 1, 1]

if we consider a greedy approach, the solution will find the first sequence as [1, 2, 1] and will be left with [1]. And hence, will return False.

But without a greedy approach, the solution will consider [1, 2, 1, 1] as a full sequence and will return True.

I ran the solution provided by you and it returns False, so I am assuming we need a greedy approach.

So, here is one possible solution:

def is_valid(lst):
    to_find = None
    
    for value in lst:
        if to_find is None:
            to_find = value
            continue
        
        if to_find is value:
            to_find = None

    return to_find is None
    
# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
rj99999
  • 109
  • 10
  • 1
    Actually the question does make that clear, saying "ends with the *next occurrence*". The "next occurrence" already emphasized in the question exactly for this reason. And I intentionally wrote the "invalid" example so that a "non-greedy" chunking is possible, to demonstrate that that's not valid. – no comment Sep 15 '21 at 18:59
  • `is` is the correct comparison for `None`, but not in general. For example, for `is_valid([x**x for x in [9, 9]])` you return `False`. – no comment Sep 15 '21 at 19:03
  • After that setup I was kind of hoping to see a non-greedy solution here... – tobias_k Sep 15 '21 at 19:03
  • @tobias_k Ha, yes, that might've been interesting. – no comment Sep 15 '21 at 19:21
  • [None-safe dict version](https://tio.run/##pY9PS8NAEMXv@RQPPCSBoWiq2AZ66CVQEIWSgyAi1WzblbAbdjeClH72dP/gtmJu7uHN7Mzvzex232YvxXTWqWFo2BZcv31tWt5krTZ5mcCeLRcNFhBSMBvk@yf7MFkeWlLB4j0DF7COYLgwHZyrDAyFUPpJx8mOmcxR5NkwTzHTKxHMXHvQ15Mr1EwbTdB72bcNOsWFKVErN7batNqGP5fEU1n80ktBuCfcEGaEwutlfku4I8wJ1z958Zrn/xkyZk9XKSF9dPLkZOnE1x6i1E7WTp5j92zztSpwoxvq3/wqmtZxaxXXjHHnlyxHN9jaMJwA) of this (probably less efficient but the symmetries are kinda cute). – no comment Sep 15 '21 at 19:21
  • @don'ttalkjustcode Yes, that one's also somewhere in my file of 20+ variations of `is_valid`... – tobias_k Sep 15 '21 at 19:40
0

A short attempt at creating a solution for this:

def isValid(input):
   if len(input) == 0:
       return True
   firstChar = input.pop(0)
   if firstChar not in input:
       return False
   input = input[input.index(firstChar)+1:]
   isValid(input)

While I do not think this is the fastest method, I think it is an interesting enough method to include here. Additionally, this can be optimised a bit further by removing the lines:

if firstChar not in input:
    return False

And put the code in a try/except block, like so:

def isValid(input):
    if len(input) == 0:
        return True
    firstChar = input.pop(0)
    try:
        input = input[input.index(firstChar)+1:]
        isValid(input)
    except:
        return False

as this code would give a ValueError if the index doesnt exist

I haven't tested the exact speed difference at the moment, but I am certain it is not the fastest method, but it should be relatively decent speed-wise.

Zaid Al Shattle
  • 1,454
  • 1
  • 12
  • 21
  • 2
    Note that with `pop(0)` and list-slicing this has O(n²) in the worst case, e.g. for `[1,1,2,3,4,5, ... ,1000,1000]`. Using a `start` parameter in the recursion would be faster. – tobias_k Sep 16 '21 at 13:52
  • I see your point about the O(n^2) complexity, but can you explain what you mean with the `start` parameter? @tobias_k – Zaid Al Shattle Sep 16 '21 at 13:55
  • 1
    If you want to keep it recursive, you can `def is_valid(lst, start=0)` and then recurse as `is_valid(lst, start=lst.index(lst[start], start+1) + 1)`, similar to [trinkot's answer](https://stackoverflow.com/a/69093017/1639625). (Oh, and the example in my first comment should of course be `[1,1,2,2,3,3,...]`) – tobias_k Sep 16 '21 at 14:02
  • Ohh, that makes a lot of sense. Thanks for the tip. I will edit the code and try to optimize it a bit further as soon as I am home. Cheers mate. – Zaid Al Shattle Sep 16 '21 at 14:05
  • *"I haven't tested the exact speed difference"* - Also seems you haven't tested at all :-). Given that this fails four of the five test cases from the question (actually all of them, since you changed the function name). – no comment Sep 17 '21 at 03:19
  • @don't talk just code , Apologies, I ran it through my mobile compiler only at the time when I was returning home, and tested the values in the first two examples you showed which worked for me. – Zaid Al Shattle Sep 18 '21 at 06:53