2

I'm working on scientific data and using a module called pysam in order to get reference position for each unique "object" in my file.

In the end, I obtain a "list of lists" that looks like that (here I provide an example with only two objects in the file):

pos = [[1,2,3,6,7,8,15,16,17,20],[1,5,6,7,8,20]]

and, for each list in pos, I would like to iterate over the values and compare value[i] with value[i+1]. When the difference is greater than 2 (for example) I want to store both values (value[i] and value[i+1]) into a new list.

If we call it final_pos then I would like to obtain:

final_pos = [[3,6,8,15,17,20],[1,5,8,20]]

It seemed rather easy to do, at first, but I must be lacking some basic knowledge on how lists works and I can't manage to iterate over each values of each list and then compare consecutive values together.. If anyone has an idea, I'm more than willing to hear about it !

Thanks in advance for your time !

EDIT: Here's what I tried:

pos = [[1,2,3,6,7,8,15,16,17,20],[1,5,6,7,8,20]]    

final_pos = []

for list in pos:
        for value in list:
            for i in range(len(list)-1):
                if value[i+1]-value[i] > 2:
                    final_pos.append(value[i])
                    final_pos.append(value[i+1])
Florian Bernard
  • 323
  • 2
  • 17
  • 3
    if you have some non-working code, we'll be happy to help you fix it. – Jean-François Fabre May 02 '18 at 19:30
  • It's possible to do this using a nested list comprehension, but the code will be more readable doing it with "traditional" `for` loops. Are you familiar with the `zip` function? – PM 2Ring May 02 '18 at 19:36
  • I added an example of what I tried but honestly I tried so many different things that I'm starting to mix everything.. ! And I'm trying to learn python by myself so there's probably some rookie mistakes everywhere. To answer your question @PM2Ring I'm not very familiar with the zip function, no. – Florian Bernard May 02 '18 at 19:41
  • 1. Can there be duplicate elements in the nested lists? 2. Will the elements in List always be in order (ascending or descending)? – Shubham May 02 '18 at 19:46
  • 1
    I've attempted to fix your code, but yes, this approach can add an element more than once. that doesn't show with your input data, but it shows with `pos = [[1,2,3,6,7,8,11,15,16,17,20],[1,5,6,7,8,20]]` – Jean-François Fabre May 02 '18 at 19:46
  • @Shubham 1. two lists can be the same, but within a list, there shouldn't be any duplicates as it corresponds to genomic positions on DNA. So one position is unique and can only be here once. 2. The elements are in ascending order. – Florian Bernard May 02 '18 at 19:51
  • @FlorianBernard If the above two conditions hold, then I guess [my answer](https://stackoverflow.com/a/50142208/3160529) should work fine. – Shubham May 02 '18 at 19:53

2 Answers2

3

You can iterate over each of the individual list in pos and then compare the consecutive values. When you need to insert the values, you can use a temporary set because you wouldn't want to insert the same element twice in your final list. Then, you can convert the temporary set to a list and append it to your final list (after sorting it, to preserve order). Also, the sorting will only work if the elements in the original list is actually sorted.

pos = [[1,2,3,6,7,8,15,16,17,20],[1,5,6,7,8,20]]
final_pos = []

for l in pos:
    temp_set = set()
    for i in range(len(l)-1):
        if l[i+1] - l[i] > 2:
            temp_set.add(l[i])
            temp_set.add(l[i+1])

    final_pos.append(sorted(list(temp_set)))

print(final_pos)

Output

[[3, 6, 8, 15, 17, 20], [1, 5, 8, 20]]

Edit: About what you tried:

for list in pos:

This line will give us list = [1,2,3,6,7,8,15,16,17,20] (in the first iteration)

for value in list:

This line will give us value = 1 (in the first iteration)

Now, value is just a number not a list and hence, value[i] and value[i+1] doesn't make sense.

Shubham
  • 2,847
  • 4
  • 24
  • 37
3

Your code has an obvious "too many loop" issues. It also stores the result as a flat list, you need a list of lists.

It has also a more subtle bug: a same index can be added more than once if 2 intervals match in a row. I've registered the added indices in a set to avoid this.

The bug doesn't show with your original data (which tripped a lot of experienced users, including me), so I've changed it:

pos = [[1,2,3,6,7,8,11,15,16,17,20],[1,5,6,7,8,20]]

final_pos = []

for value in pos:
    sublist = []
    added_indexes = set()
    for i in range(len(value)-1):
        if value[i+1]-value[i] > 2:
            if not i in added_indexes:
                sublist.append(value[i])
                ## added_indexes.add(i)  # we don't need to add it, we won't go back
            # no need to test for i+1, it's new
            sublist.append(value[i+1])
            # registering it for later
            added_indexes.add(i+1)
    final_pos.append(sublist)

print(final_pos)

result:

[[3, 6, 8, 11, 15, 17, 20], [1, 5, 8, 20]]

Storing the indexes in a set, and not the values (which would also work here, with some post-processing sort, see this answer) also would work when objects aren't hashable (like custom objects which have a custom distance implemented between them) or only partially sorted (waves) if it has some interest (ex: pos = [[1,2,3,6,15,16,17,20,1,6,10,11],[1,5,6,7,8,20,1,5,6,7,8,20]])

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • If i understand correctly, in order to not implement twice the same value[i] in sublist, you create a list called added_indexes where you add i everytime you add value[i] to sublist. And you check added_indexes everytime you want to add a value[i] to be sure i is not already in there. But then, why not directly check the sublist for value[i] directly ? – Florian Bernard May 02 '18 at 20:13
  • yes, but this is way slower for big lists (set lookup is faster). Besides, it's a general solution which can handle list of objects having a distance from each other. and it works with partially sorted lists – Jean-François Fabre May 02 '18 at 20:15
  • It just seemed odd to me but if it's faster I'm not complaining at all ! Thanks for your time, it's greatly appreciated :) – Florian Bernard May 02 '18 at 20:16
  • It was a pleasure trying to solve this. Appeared trivial at first, yes, but it wasn't. It's faster if the list is really huge. Otherwise the difference isn't noticeable. – Jean-François Fabre May 02 '18 at 20:17