3

Suppose I have a list like this:

lst = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48] 

I need to extract the elements which have a difference of one. I need final output to look like this:

[[3, 4, 5], [20, 21, 22, 23], [47, 48]] 

Here is what I have tried so far for this particular problem which has yielded some progress but doesn't achieve 100% of what I need:

final_list = []
for i in range(len(lst)):
    sub_list = []
    for ii in range(i, len(lst)):
        prev_num = lst[ii-1]
        if lst[ii] - prev_num == 1:
#             print(lst[ii], end=",")
            sub_array.append(lst[ii])
        else:
            break
    if sub_list:
        final_list.append(sub_list)
print(final_list)

Output:

[[4, 5], [5], [21, 22, 23], [22, 23], [23], [48]]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Hammad
  • 529
  • 7
  • 17

5 Answers5

6

You can use itertools.groupby to group the items with a key function that subtracts each item value with an incremental counter, which would result in a fixed value for each group of consecutive integers:

from itertools import groupby, count

lst = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48]
c = count()
final_list = [g for _, [*g] in groupby(lst, lambda t: t - next(c)) if len(g) > 1]

final_list would become:

[[3, 4, 5], [20, 21, 22, 23], [47, 48]]

EDIT: If you'd rather have better performance than more concise code, you can look at @MadPhysicist's answer, which avoids overhead incurred from calling generator functions, and @don'ttalkjustcode's answer, which avoids creating lists until a pair of consecutive numbers are found. Combine the two answers and you get a solution that avoids both types of overhead:

out = []
prev = float('inf')
for i in lst:
    if i - prev != 1:
        current = None
    elif current:
        current.append(i)
    else:
        current = [prev, i]
        out.append(current)
    prev = i

Sample timing using @don'ttalkjustcode's benchmark code:

2716 μs  Mad_Physicist
1554 μs  dont_talk_just_code
1284 μs  Mad_Physicist_x_dont_talk_just_code

Try it online!

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • 2
    This is the most efficient answer in terms of time efficiency and readability. If only I could accept multiple answers. – Hammad Sep 09 '21 at 03:19
  • update: it is actually quite slower than the other two methods for significantly larger lists – Hammad Sep 09 '21 at 03:23
  • Is it not a duplicate like [the other one](https://stackoverflow.com/q/69111269/16759116) you closed an hour earlier? – no comment Sep 09 '21 at 03:54
  • 1
    @don'ttalkjustcode I thought about closing it as a duplicate indeed but since this question has a somewhat unique requirement of skipping standalone items, I thought it would be apt to show how it can be done in a concise way with the help of unpacking during assignment of a comprehension, which I had never seen in an answer for this type of question before. – blhsing Sep 09 '21 at 04:02
  • 1
    @blhsing Alright, now I also posted one with similar reasoning. – no comment Sep 09 '21 at 04:46
  • [generator.send version](https://tio.run/##hVDRaoQwEHzPV8yjlrSc3pVKof0REenVGBc0CXGl9evtxvNK37owZCczDMOGlQfvzlWI29ZHP4FpMsSgKfjIB1O7Qmwiez/Od9FGv4TrqvHpF8dKdabHdRxmcrb9Ih5aa5yJH@xjOxvXZePM@auCTHKKmB00DeMNp1/W@wgCuVv0X9/du5IZO9keQbtm5W@P3Jkz35zZ2x4NL9Ghtntqq1E/2CZlH/1TLw37lDrmoB6jxNgc7ygapUSU5LrQOGtcNJ41Ko3iJBBWyluKVpaCZBB@eRFUDZQKkaT8fyfJt@0H) of yours, ~10% faster but not pretty. – no comment Sep 09 '21 at 05:23
  • Interesting... I had tried something like that `Mad_Physicist_x_dont_talk_just_code` but it wasn't faster. Seems to be because I had initialized with `prev = None` (which I find nicer) and then checked `i - 1 == prev`. I guess yours is faster because `i - prev` is among those small ints that are cached (-5 to 256 or so) so you save the int object creations. – no comment Sep 09 '21 at 07:03
  • Isolated [test](https://tio.run/##RY7BCsIwEETv@Yq5SBKJ0KCIFXr0K0REcdUckobtVvDra2JB57TzeCyT3/Ls03qXeZru3EdIiBQEIeaeBUyZLqLUQDJmdNCZ6eUQytm2rYNvmkarQ6lHhRIdsIJH16GK2v1h7ZX7Ak9K3XvGGSGBL@lBZm33X7Viqvgw9xop32NIZt5iyOG7xiGN8Urc@U0ZYe3PzxySGL24YRygsYARLOFpax1o1mbFTtMH) of my previous comment. Same number of checks as in the real benchmark, and shows similar time difference (about 250-300 us). – no comment Sep 09 '21 at 07:18
  • @don'ttalkjustcode Very good catch. I unwittingly used the uglier but better performing approach only because I hadn't thought about using `i - 1 == prev` to avoid having to perform arithmetic operations with `prev`. Prettier isn't necessarily faster, unfortunately. – blhsing Sep 09 '21 at 07:45
2

You can step along the input list and start adding elements to an output list. Any time the difference is greater than 1, start a new output list. If the prior output is length 1, remove it:

out = []
current = []
prev = None
for i in lst:
    if prev is None or i - prev != 1:
        if len(current) > 1:
            out.append(current)
        current = []
    current.append(i)
    prev = i
if len(current) > 1:
    out.append(current)
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
2

A zip solution that adds each first pair of a streak as an inner list and then appends further elements of the streak:

out = []
inner = None
for x, y in zip(lst, lst[1:]):
    if y - x == 1:
        if inner:
            inner.append(y) 
        else:
            inner = [x, y]
            out.append(inner)
    else:
        inner = None

Benchmark with a random list 1000 as long as your example and with same density:

5172 μs  blhsing
2697 μs  Mad_Physicist
7667 μs  Osman_Mamun
1571 μs  dont_talk_just_code

Benchmark code (Try it online!):

from timeit import timeit
from itertools import groupby, count
from random import sample

def blhsing(lst):
    c = count()
    return [g for _, [*g] in groupby(lst, lambda t: t - next(c)) if len(g) > 1]

def Mad_Physicist(lst):
    out = []
    current = []
    prev = None
    for i in lst:
        if prev is None or i - prev > 1:
            if len(current) > 1:
                out.append(current)
            current = []
        current.append(i)
        prev = i
    if len(current) > 1:
        out.append(current)
    return out

def Osman_Mamun(lst):
    out = []
    for k, g in groupby(enumerate(lst), key=lambda i: i[1]-i[0]):
        temp = list(g)
        if len(temp) > 1:
            out.append([i[1] for i in temp])
    return out

def dont_talk_just_code(lst):
    out = []
    inner = None
    for x, y in zip(lst, lst[1:]):
        if y - x == 1:
            if inner:
                inner.append(y) 
            else:
                inner = [x, y]
                out.append(inner)
        else:
            inner = None
    return out


lst_example = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48] 
funcs = blhsing, Mad_Physicist, Osman_Mamun, dont_talk_just_code

for _ in range(3):
    lst = sorted(sample(range(max(lst_example) * 1000), len(lst_example) * 1000))
    # lst = lst_example
    results = []
    for func in funcs:
        t = timeit(lambda: results.append(func(lst)), number=100) / 100
        print('%d μs ' % (t * 1e6), func.__name__)
    assert all(result == results[0] for result in results), (list(map(len, results)), results)
    print()
no comment
  • 6,381
  • 4
  • 12
  • 30
  • Nice approach. +1 I like how you avoid creating a list until a pair of consecutive numbers are found. Generators are neat but incur overhead from repeated function calls, however. I've mixed your code with @MadPhysicist to avoid generators and avoid creating lists unnecessarily and got somewhat better performance as a result. You can find it in my updated answer. – blhsing Sep 09 '21 at 06:37
  • @blhsing Hmm, if you mean my generator version of your original solution, I'd say that *saved* function calls. For every element, you had the calls of your lambda, of `next`, and of the `count` iterator's `__next__` function. While I had only the call to my generator's `send` function. – no comment Sep 09 '21 at 06:47
  • By generator I mean any generator, including your use of `zip` and my use of `groupby` (which generates a sequence of generators, which is why `groupby` performs so badly). But yeah your use of `send` is pretty ingenius and does shave some overhead off of my solution, albeit only marginally. – blhsing Sep 09 '21 at 07:49
  • @blhsing Ah, ok. I'm going with the [Python terminology](https://docs.python.org/3/glossary.html#term-generator), where [none of those](https://tio.run/##RY/dCsMgDIXvfYpcKsgY7GYMdr2HGGW0YjuhNRJTaPfyTqU/uUvOl3OSsPIX/e0eKKWecAKD42gNO/Tx0nYG3BSQGF7WW2oZSVTKsSVGHOOuD4Rz6Fad92fPQhSg4BGe8BaQSy7QI8ECzsO7UboOKy235ueCPJTN8Bx4u7AcqsdHw1BsDubaqEw1QhR1jy7EccajegRyOc5F5yO33li56/p8UOljS6X0Bw) are generators. – no comment Sep 09 '21 at 08:08
  • Well in your example you actually call the generators to make them return iterators: https://tio.run/##jY5BCoMwEEX3nmKWCUgpdNMblJ6hSNEQ7UCcCeMI6uXTGsSNtHT933@8OOuL6XKNklIr3IPjELxTZBoA@8iiUDeuyBuqF2UO@9IJj7GZy89rJC2KKEhqcEAatCbnjZmgZYEJkOBR2XJ1nW6evNTKYu3xkU1mI@/6HVwwml35C9wi/4PJT2q6HP0soVu79/@5sgdDSm8 – blhsing Sep 09 '21 at 08:22
  • But then again, `isinstance(zip, abc.Generator)` returns `False` either, so I don't know what functions like `zip`, which return iterators, should be officially classified as then. Personally I would call what functions like a generator, a generator. – blhsing Sep 09 '21 at 08:24
  • @blhsing Yes, I call them because the `isinstance(..., abc.Generator)` test actually checks for generator iterators, not generator functions. How to classify `zip` etc themselves? Maybe "iterator producers" :-). I'm not aware of a standard term. – no comment Sep 09 '21 at 08:30
1

Using itertools.groupby:

In [19]: out = []

In [20]: for k, g in groupby(enumerate(lst), key=lambda i: i[1]-i[0]):
    ...:     temp = list(g)
    ...:     if len(temp) > 1:
    ...:         out.append([i[1] for i in temp])
    ...:

In [21]: out
Out[21]: [[3, 4, 5], [20, 21, 22, 23], [47, 48]]
Osman Mamun
  • 2,864
  • 1
  • 16
  • 22
  • concise and easy to read, but has slightly higher time complexity – Hammad Sep 09 '21 at 03:17
  • 1
    No, this has the same time complexity as the other two answers. – blhsing Sep 09 '21 at 03:19
  • the accepted answer is roughly twice as fast than this solution – Hammad Sep 09 '21 at 03:25
  • 2
    Time complexity means how its performance scales with the size of the inputs, not how it performs relative to another algorithm at a certain fixed input size. This answer is slower than the accepted one because it has more overhead, but not because it has a higher time complexity. – blhsing Sep 09 '21 at 03:29
  • 1
    got it, my bad. really appreciate your explanation. – Hammad Sep 09 '21 at 03:40
1

You can go through the elements and either add them as a new group or add them to the last group depending on whether it is one more than the last one added. When done, remove the single element groups.

lst = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48]

groups = []
for n in lst:
    if groups and n-1 == groups[-1][-1]:
        groups[-1].append(n)              # +1 element, add to last group
    else:
        groups.append([n])                # add as a new group
groups = [g for g in groups if len(g)>1]  # remove single element groups

print(groups)
[[3, 4, 5], [20, 21, 22, 23], [47, 48]]
Alain T.
  • 40,517
  • 4
  • 31
  • 51