4

I'm working with lists that look as follows:

[2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35]

In the end I want to extract only the first and last integer in a consecutive series, as such:

[(2,8),(13,20),(30,35)]

I am new to working with Python, below my code for trying to solve this problem

helix = []
single_prot_helices = []
for ind,pos in enumerate(prot[:-1]):
    if pos == prot[ind+1]-1: #if 2 == 3-1 essentially
        helix.append(pos)
    elif pos < prot[ind+1]-1: #if 8 < 13-1 for example
        helix.append(pos)
        single_prot_helices.append(helix) #save in a permanent list, clear temp list
        helix.clear()

In this case prot is a list just like the example above. I expected single_prot_helices to look something like this:

[[2,3,4,5,6,7,8],[13,14,15,16,17,18,19,20],[30,31,32,33,34,35]]

and at this point it would have been easy to get the first and last integer from these lists and put them in a tuple, but instead of the expected list I got:

[[20,30,31,32,33,34,35],[20,30,31,32,33,34,35]]

Only the last series of numbers was returned and I got 1 less lists than expected (expected 3, received 2). I don't understand where I made a mistake since I believe my code follows my logic: look at the number (pos), look at the next number, if the next number is larger by 1 then add the number (pos) to a list (helix); if the next number is larger by more than 1 then add the smaller number (pos) to the list (helix), append the list to a permanent list (single_prot_helices) and then clear the list (helix) to prepare it for the next series of numbers to be appended.

Any help will be highly appreciated.

plmnkndv
  • 43
  • 4

4 Answers4

3

You could do something like this:

foo = [2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35]
series = []
result = []
for i in foo:
    # if the series is empty or the element is consecutive
    if (not series) or (series[-1] == i - 1):
        series.append(i)
    else:
        # append a tuple of the first and last item of the series
        result.append((series[0], series[-1]))
        series = [i]
# needed in case foo is empty
if series:
    result.append((series[0], series[-1]))
print(result) # [(2, 8), (13, 20), (30, 35)]

Or, as a generator:

def generate_series(list_of_int):
    series = []
    for i in list_of_int:
        if not series or series[-1] == i - 1:
            series.append(i)
        else:
            yield (series[0], series[-1])
            series = [i]
    if series:
        yield (series[0], series[-1])

foo = [2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35]
print([item for item in generate_series(foo)]) # [(2, 8), (13, 20), (30, 35)]

Yours has a few problems. The main one is that helix is a mutable list and you only ever clear it. This is causing you to append the same list multiple times which is why they're all identical.

The first fix is to assign a new list to helix rather than clearing.

prot = [2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35]
helix = []
single_prot_helices = []
for ind,pos in enumerate(prot[:-1]):
    if pos == prot[ind+1]-1: #if 2 == 3-1 essentially
        helix.append(pos)
    elif pos < prot[ind+1]-1: #if 8 < 13-1 for example
        helix.append(pos)
        single_prot_helices.append(helix) #save in a permanent list, clear temp list
        helix = []
print(single_prot_helices) # [[2, 3, 4, 5, 6, 7, 8], [13, 14, 15, 16, 17, 18, 19, 20]]

As you can see the last list is missed. That is because the last helix is never appended.

You could add:

if helix:
    single_prot_helices.append(helix)

But that still only gives you:

[[2, 3, 4, 5, 6, 7, 8], [13, 14, 15, 16, 17, 18, 19, 20], [30, 31, 32, 33, 34]]

leaving out the last element since you only ever iterate to the second from last one.

Which means you would need to do something complicated and confusing like this outside of your loop:

if helix:
    if helix[-1] == prot[-1] - 1:
        helix.append(prot[-1])
        single_prot_helices.append(helix)
    else:
        single_prot_helices.append(helix)
        single_prot_helices.append(prot[-1])
else:
    single_prot_helices.append(prot[-1])

Giving you:

[[2, 3, 4, 5, 6, 7, 8], [13, 14, 15, 16, 17, 18, 19, 20], [30, 31, 32, 33, 34, 35]]

If you're still confused by names and mutability Ned Batchelder does a wonderful job of explaining the concepts with visual aids.

Axe319
  • 4,255
  • 3
  • 15
  • 31
  • 1
    Thank you a lot for the help. It worked instantly and I just continued with my work without thinking of leaving a 'thank you'. So here I am a month and a half later, again, thanks so much. – plmnkndv Jan 23 '23 at 15:20
0

You can try this one that uses zip().

res = []
l, u = prot[0], -1
for x, y in zip(prot, prot[1:]):
    if y-x > 1:
        res.append((l, x))
        u, l = x, y
res.append((l, prot[-1]))  # [(2, 8), (13, 20), (30, 35)]
0x0fba
  • 1,520
  • 1
  • 1
  • 11
0

The solution of @Axe319 works but it doesn't explain what is happening with your code.

The best way to copy data from a list is by using copy(), otherwise you will copy its pointer.

When you add helix to single_prot_helices you will add a pointer of that list so you will have:

# helix = [2, 3, 4, 5, 6, 7, 8]
# single_prot_helices = [[2, 3, 4, 5, 6, 7, 8]]

And when you do helix.clear() you will have:

# helix = []
# single_prot_helices  = [[]]

Why ? Because in single_prot_helices you added the pointer and not the elements of that list.

After the second iteration you will have

# helix = [13, 14, 15, 16, 17, 18, 19, 20]
# single_prot_helices = [[13, 14, 15, 16, 17, 18, 19, 20], [13, 14, 15, 16, 17, 18, 19, 20]]

Why two lists in single_prot_helices ? Because you added a second pointer to the list, and first was still there.

Add some prints to your code to understand it well. Note that I added 40, 41 in the list so that you can have better visualisation:

prot = [2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35,40, 41]
helix = []
single_prot_helices = []
for ind,pos in enumerate(prot[:-1]):
    if pos == prot[ind+1]-1: #if 2 == 3-1 essentially
        helix.append(pos)
    elif pos < prot[ind+1]-1: #if 8 < 13-1 for example
        helix.append(pos)
        print("helix")
        print(helix)

        single_prot_helices.append(helix) #save in a permanent list, clear temp list
        print(single_prot_helices)
        helix.clear()
        print("clear")
        print(helix)
        print(single_prot_helices)
Aymen
  • 841
  • 1
  • 12
  • 21
0

In a single line

data = [2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35]
mask = iter(data[:1] + sum(([i1, i2] for i1, i2 in zip(data, data[1:] + data[:1]) if i2 != i1+1), [])) 
print (list(zip(mask, mask)))

Gives #

[(2, 8), (13, 20), (30, 35)]

Functional approch

Using groupby and count from itertools considering taking diffrence between index and item if they are same group accordingly. Count them uisng ittertools counter - counter

data = [2,3,4,5,6,7,8,13,14,15,16,17,18,19,20,30,31,32,33,34,35]
from itertools import groupby, count
final = []
c = count()

for key, group in groupby(data, key = lambda x: x-next(c)):
    block = list(group)
    final.append((block[0], block[-1]))

print(final)

Also gives #

[(2, 8), (13, 20), (30, 35)]