Pythonic way to determine whether not null list entries are 'continuous'

Question

I'm looking for a way to easily determine if all not None items in a list occur in a single continuous slice. I'll use integers as examples of not None items.

For example, the list [None, None, 1, 2, 3, None, None] meets my requirements for continuous integer entries. By contrast, [1, 2, None, None, 3, None] is not continuous, because there are None entries between integers.

Some more examples to make this a clear as possible.

Continuous:
[1, 2, 3, None, None]
[None, None, 1, 2, 3]
[None, 1, 2, 3, None]

Not Continuous:
[None, 1, None, 2, None, 3]
[None, None, 1, None, 2, 3]
[1, 2, None, 3, None, None]

My first approach was to use variables to keep track of whether or not we had come across a None yet, and whether or not we had come across an int yet -- this ends up with a highly nested and very difficult to follow series of if/else statements embedded in a for loop. (On top of the ugliness, I confess I haven't gotten it to work in every case).

Anyone know an easier way to figure out if the not None items in a list occur in a single continuous slice?

What result would you expect for a sequence of None (`[None, None, None, None]`)? — camh, Feb 07 '13 at 11:27
That is an important question for the general problem, but for my specific needs, it wouldn't really matter whether `[None, None, None]` is determined to be continuous or not. What I'm actually doing is grabbing the first set of not None items, and then doing stuff with them -- what exactly it is that I do depends on whether it is the only continuous slice. So -- all None's wouldn't even get me to the first step in the process. But, theoretically, it should probably return True, right? — Clay Wardell, Feb 07 '13 at 15:46
If it is guaranteed that the list contains at least one non-None item, please edit that into the question. And what about the cases where the list is length-zero `[]`, or None? Should it give True, False, None, raise exception or you totally don't care? For simplicity, we might as well (retro-)define the don't-care behavior according to what turns out to be most Pythonic. — smci, May 08 '19 at 03:51

John La Rooy · Accepted Answer · 2013-02-06T13:07:52.087

44

def contiguous(seq):
    seq = iter(seq)
    all(x is None for x in seq)        # Burn through any Nones at the beginning
    any(x is None for x in seq)        # and the first group
    return all(x is None for x in seq) # everthing else (if any) should be None.

Here are a couple of examples. You can use next(seq) to get the next item from an iterator. I'll put a mark pointing to the next item after each

example1:

seq = iter([None, 1, 2, 3, None])        #  [None, 1, 2, 3, None]
                                         # next^
all(x is None for x in seq)            
                                         #        next^
any(x is None for x in seq)            
                                         #                    next^ (off the end)
return all(x is None for x in seq)       # all returns True for the empty sequence

example2:

seq = iter([1, 2, None, 3, None, None])  #    [1, 2, None, 3, None, None]
                                         # next^
all(x is None for x in seq)            
                                         #    next^
any(x is None for x in seq)            
                                         #             next^  
return all(x is None for x in seq)       # all returns False when 3 is encountered

edited Feb 06 '13 at 13:07

answered Feb 06 '13 at 04:41

John La Rooy

295,403
53
369
502

4

So simple ... Yet so right. darn... wish I had thought of it. – mgilson Feb 06 '13 at 04:42
1

Note that your comments aren't exactly right (I think). You'll burn through all the None's at the beginning plus the first non-None with the first `all`. I don't think that changes the result though. – mgilson Feb 06 '13 at 04:47
12

+1, because this is very clever, and I never would have thought of it. I'm not totally sold on the use of `any` and `all` for side-effect purposes of advancing an iterator, but I haven't got a good argument against it. – DSM Feb 06 '13 at 04:52
5

After trying to look up how this worked, I'm a bit stumped -- according to the docs, all() and any() return True or False depending on the contents of their input... and yet they seem to be used to modify their input in place here. You know where I might find some docs explaining how this works? – Clay Wardell Feb 06 '13 at 04:52
@mgilson, there wasn't enough room in the margin :) You are correct though – John La Rooy Feb 06 '13 at 04:53
@ClayWardell, the trick is to make seq an iterator. Then `any` and `all` move the iterator along (consume it) even though I don't need the return value of the first two. – John La Rooy Feb 06 '13 at 04:55
-1 because your are using implicit implementation details of `all` & `any`. While this is a clever use of iterators, I would never recommend this kind of programming tricks. You have to respect the semantics of the stl. In the current case, `groupby` make the solution easier to understand. – Simon Bergot Feb 06 '13 at 11:05
4

@Simon, It uses _defined_ behaviour of all and any. Why do you think it's implicit? What do you mean by stl? – John La Rooy Feb 06 '13 at 11:26
1

@gnibbler - could you please explain in the answer why this works, as per Clay's comment it is not really pbvious – mmmmmm Feb 06 '13 at 12:00
@gnibbler just look at the [definitions of `all`/ `any`](http://docs.python.org/2/library/functions.html#all) in the docs. The lazyness of those functions is never explicitly stated. You can deduce it from the `equivalent to` snippets. I would still put this particular behavior in the undocumented part of the function, and would not rely on that (except for performances). Just look at the comments asking for clarifications. Your version is harder to understand. – Simon Bergot Feb 06 '13 at 12:32
@Simon, so you're saying that the short-circuit behaviour of any/all is not explicit? That would have terrible performance implications for a lot of code. Some people think this version is easier to read, so I guess it's subjective – John La Rooy Feb 06 '13 at 12:52
1

@gnibbler [dropwhile](http://docs.python.org/2/library/itertools.html#itertools.dropwhile) or takewhile are explicit about this behavior. any/all are not. I would rather go with those functions if I wanted to implement your version. – Simon Bergot Feb 06 '13 at 13:00
5

@Simon It *is* explicitly stated. The documentation *states* that `any` and `all` are **equivalent** to those pieces of code. Equivalent means that using the built-in function or the sample code should always produce the same results. Since the sample codes are lazy then also built-in `any` and `all` must be lazy. The doc's writers simply decided that instead of pointing out one by one all the corner cases and things about these simple function their properties well better explained with a small piece of readable code. – Bakuriu Feb 06 '13 at 15:36
1

`You'll burn through all the None's at the beginning plus the first non-None with the first all.` Would this fail on [None,1,None,2,3,None] or [None,None,1,None] then? – Rob Feb 06 '13 at 16:01
@Rob, it's easy enough to just try those cases. The first one returns `False` and the second returns `True` as expected. – John La Rooy Feb 06 '13 at 17:14
2

-1 because unreadable at first sight, which IMHO does not qualify as "pythonic way" – Gyom Feb 14 '13 at 09:32

DSM · Answer 2 · 2013-02-06T13:44:33.183

Good 'ol itertools.groupby to the rescue:

from itertools import groupby

def contiguous(seq):
    return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1

gives

>>> contiguous([1,2,3,None,None])
True
>>> contiguous([None, 1,2,3,None])
True
>>> contiguous([None, None, 1,2,3])
True
>>> contiguous([None, 1, None, 2,3])
False
>>> contiguous([None, None, 1, None, 2,3])
False
>>> contiguous([None, 1, None, 2, None, 3])
False
>>> contiguous([1, 2, None, 3, None, None])
False

[edit]

Since there seems to be some discussion in the comments, I'll explain why I like this approach better than some of the others.

We're trying to find out whether there is one contiguous group of non-None objects, and

sum(1 for k,g in groupby(seq, lambda x: x is not None) if k)

counts the number of contiguous non-None objects, using the function in the stdlib which is designed for making collecting contiguous groups. As soon as we see groupby, we think "contiguous groups", and vice-versa. In that sense, it's self-documenting. This is basically the definition of my goal.

IMHO the only weakness is that it doesn't short-circuit, and that could be fixed, but after thinking about it some I still prefer this as it uses a primitive I like -- "count the number of contiguous non-None groups" -- which I prefer to simply "tell me whether or not there is more than one contiguous non-None group as soon as you can".

Many of the approaches to implement the last one rely on clever observations about the problem, like "if there's only one contiguous group of not-None objects, then if we scan until we find the first not-None object, and then scan through objects until we find the first non-None group if one exists, then whether anything's left is None gives us our answer." (Or something like that, which is part of my issue: I have to think about it.) To me that feels like using "implementation details" about the problem to solve it, and focuses on properties of the problem we can use to solve it, rather than simply specifying the problem to Python and letting Python do the work.

I'm a bear of very little brain, as the saying has it, and I like to avoid having to be clever, as in my experience it's a route littered with FAIL.

As always, everyone's mileage may vary, of course, and probably in proportion to their cleverness.

can't you use `any` or `all` instead of `sum`? Seems wasteful to look through the whole list if you don't need to. — John La Rooy, Feb 06 '13 at 04:28
@gnibbler: I can't immediately see how to use `any` or `all`. I guess you could use `len(list(islice((k for k,g in groupby(seq, lambda x: x is not None) if k), 0, 2))) == 1` or something, but I don't like the look of it much. — DSM, Feb 06 '13 at 04:31
@DSM, i found a way using `all`. Surprisingly it doesn't use groupby — John La Rooy, Feb 06 '13 at 04:43

Blender · Answer 3 · 2013-02-06T04:35:56.283

12

You could use something like itertools.groupby:

from itertools import groupby

def are_continuous(items):
    saw_group = False

    for group, values in groupby(items, lambda i: i is not None):
        if group:
            if saw_group:
                return False
            else:
                saw_group = True

    return True

This will iterate only until it sees a group twice. I'm not sure if you consider [None, None], so tweak it to your needs.

edited Feb 06 '13 at 04:35

answered Feb 06 '13 at 04:21

Blender

289,723
53
439
496

2

I like this one because it's clearer than @DSM's and "shortcircuits" when it can. – John La Rooy Feb 06 '13 at 11:33

mgilson · Answer 4 · 2013-02-06T04:41:35.453

This may not be the best way to go about doing it, but you can look for the first non-None entry and the last non-None entry and then check the slice for None. e.g.:

def is_continuous(seq):
    try:
        first_none_pos = next(i for i,x in enumerate(seq) if x is not None)
        #need the or None on the next line to handle the case where the last index is `None`.
        last_none_pos = -next(i for i,x in enumerate(reversed(seq)) if x is not None) or None
    except StopIteration: #list entirely of `Nones`
        return False
    return None not in seq[first_none_pos:last_none_pos]

assert is_continuous([1,2,3,None,None]) == True
assert is_continuous([None, 1,2,3,None]) == True
assert is_continuous([None, None, 1,2,3]) == True
assert is_continuous([None, 1, None, 2,3]) == False
assert is_continuous([None, None, 1, None, 2,3]) == False
assert is_continuous([None, 1, None, 2, None, 3]) == False
assert is_continuous([1, 2, None, 3, None, None]) == False

This will work for any sequence type.

score 7 · Answer 5 · answered Feb 06 '13 at 10:44

The natural way to consume sequence elements is to use dropwhile:

from itertools import dropwhile
def continuous(seq):
    return all(x is None for x in dropwhile(lambda x: x is not None,
                                            dropwhile(lambda x: x is None, seq)))

We can express this without nested function calls:

from itertools import dropwhile
def continuous(seq):
    core = dropwhile(lambda x: x is None, seq)
    remainder = dropwhile(lambda x: x is not None, core)
    return all(x is None for x in remainder)

IIUC, this is the `dropwhile` version of @gnibbler's? – DSM Feb 06 '13 at 13:12 — DSM, Feb 06 '13 at 13:12

ugoren · Answer 6 · 2013-02-07T10:09:54.033

5

One liner:

contiguous = lambda l: ' ' not in ''.join('x '[x is None] for x in l).strip()

The real work is done by the strip function. If there are spaces in a stripped string, then they're not leading/trailing. The rest of the function converts the list to a string, which has a space for each None.

edited Feb 07 '13 at 10:09

answered Feb 06 '13 at 12:11

ugoren

16,023
3
35
65

Although might there be a problem if the data types in the list were not coercible to strings? It might not have been clear in the question, but I was using integers strictly as an easy example -- I need the algorithm to work for all not None data types. – Clay Wardell Feb 06 '13 at 17:00
@ClayWardell If you change `x==None` to `x is None` then it will work for any value type (and will probably be a little faster, as well). – Edward Loper Feb 06 '13 at 18:13
Thanks @EdwardLoper, I actually didn't know the `is` operator and thought `==` is good enough here. – ugoren Feb 07 '13 at 10:10
`contiguous = lambda l:`? Why name an anonymous function rather than declare a function `def contiguous(l):`? – johnsyweb Feb 14 '13 at 01:38
@Johnsyweb, So you can see it all, without a horizontal scroll bar. Not such a good reason, I admit (I probably spend too much time at [codegolf.SE](http://codegolf.stackexchange.com/)). – ugoren Feb 14 '13 at 07:31
`c=lambda l:' 'not in''.join('x '[x==None]for x in l).strip()`, I think you will love this, golfer. – Ray Feb 16 '13 at 19:04

score 3 · Answer 7 · answered Feb 06 '13 at 18:10

I did some profiling to compare @gnibbler's approach with the groupby approach. @gnibber's approach is consistently faster, esp. for longer lists. E.g., I see about a 50% performance gain for random inputs with length 3-100, with a 50% chance of containing a single int sequence (randomly selected), and otherwise with random values. Test code below. I interspersed the two methods (randomly selecting which one goes first) to make sure any caching effects get cancelled out. Based on this, I'd say that while the groupby approach is more intuitive, @gnibber's approach may be appropriate if profiling indicates that this is an important part of the overall code to optimize -- in that case, appropriate comments should be used to indicate what's going on with the use of all/any to consumer iterator values.

from itertools import groupby
import random, time

def contiguous1(seq):
    # gnibber's approach
    seq = iter(seq)
    all(x is None for x in seq)        # Burn through any Nones at the beginning
    any(x is None for x in seq)        # and the first group
    return all(x is None for x in seq) # everthing else (if any) should be None.

def contiguous2(seq):
    return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1

times = {'contiguous1':0,'contiguous2':0}

for i in range(400000):
    n = random.randint(3,100)
    items = [None] * n
    if random.randint(0,1):
        s = random.randint(0,n-1)
        e = random.randint(0,n-s)
        for i in range(s,e):
            items[i] = 3
    else:
        for i in range(n):
            if not random.randint(0,2):
                items[i] = 3
    if random.randint(0,1):
        funcs = [contiguous1, contiguous2]
    else:
        funcs = [contiguous2, contiguous1]
    for func in funcs:
        t0 = time.time()
        func(items)
        times[func.__name__] += (time.time()-t0)

print
for f,t in times.items():
    print '%10.7f %s' % (t, f)

score 2 · Answer 8 · answered Feb 06 '13 at 17:57

Here's a solution inspired by numpy. Get the array indices of all the non-null elements. Then, compare each index to the one following it. If the difference is greater than one, there are nulls in between the non-nulls. If there are no indices where the following index is more than one greater, then there are no gaps.

def is_continuous(seq):
    non_null_indices = [i for i, obj in enumerate(seq) if obj is not None]
    for i, index in enumerate(non_null_indices[:-1]):
        if non_null_indices[i+1] - index > 1:
            return False
    return True

score 1 · Answer 9 · answered Feb 06 '13 at 05:50

This algorithm does the work with a few drawbacks (it removes items form the list). But it's a solution.

Basically if you remove all continuous None from start and the end. And if you found some None in the list then the integers are not in a continuous form.

def is_continuous(seq):
    while seq and seq[0] is None: del seq[0]
    while seq and seq[-1] is None: del seq[-1]

    return None not in seq

assert is_continuous([1,2,3,None,None]) == True
assert is_continuous([None, 1,2,3,None]) == True
assert is_continuous([None, None, 1,2,3]) == True
assert is_continuous([None, 1, None, 2,3]) == False
assert is_continuous([None, None, 1, None, 2,3]) == False
assert is_continuous([None, 1, None, 2, None, 3]) == False
assert is_continuous([1, 2, None, 3, None, None]) == False

Yet, another example of how small code could become evil.

I wish a strip() method were available for list.

score 1 · Answer 10 · edited Jun 20 '20 at 09:12

My first approach was to use variables to keep track ...

...this ends up with a highly nested and very difficult to follow series of if/else statements embedded in a for loop...

No! Actually you need only one variable. Thinking this problem in the view of Finite State Machine(FSM) with your approach will lead to a quite nice solution.

We call the state p. At first, p is 0. Then we start walking between the states.

FSM

When all the elements in the list is examinated and still don't fail then the answer is True.

One version that encode the translation table in a dict

def contiguous(s, _D={(0,0):0, (0,1):1, (1,0):2, (1,1):1, (2,0):2, (2,1):3}):
    p = 0
    for x in s:
        p = _D[p, int(x is not None)]
        if p >= 3: return False
    return True

Another version that use if statement:

def contiguous(s):
    p = 0
    for x in s:
        if x is None and p == 1 or x is not None and (p == 0 or p == 2):
            p += 1
        if p >= 3: return False
    return True

So my point is that using if and for are still pythonic.

update

I found another way to encode the FSM. We can pack the translation table into a 12bit integer.

def contiguous(s):
    p = 0
    for x in s:
        p = (3684 >> (4*p + 2*(x!=None))) & 3
        if p >= 3: return False
    return True

Here 3684, the magic number, can be obtained by:

    _D[p,a]     3  2  1  2  1  0
         p      2  2  1  1  0  0
         a      1  0  1  0  1  0
bin(3684) = 0b 11 10 01 10 01 00

The readability is not as good as other version but it's faster since it avoids dictionary lookup. The second version is as fast as this but this encoding idea can be generalized to solve more problems.

score 0 · Answer 11 · answered Jul 29 '19 at 12:32

Here's a way just using numpy :

a = np.array([1, 2, 3, np.nan, 4, 5, np.nan, 6, 7])

# This returns indices of nans
# eg. [[3], [6]]
# use .squeeze() to convert to [3, 6]
aa = np.argwhere(a != a).squeeze()

# use a diff on your array , if the nans
# are continuous, the diff will always be 1
# if not, diff will be > 1 , and using any() will return True
any(np.diff(aa) > 1)

Pythonic way to determine whether not null list entries are 'continuous'

11 Answers11

update

Linked