0

Context: I have sublists inside the main list called lst_separated and I'd like to go through each element of the sublist using a while loop, but I can't seem to figure how to. I also have a second list that has the indices for each sublist inside the lst_separated called populated_indices. The goal is to pick a sublist (let's say this one: ['230320','230400','25020','26232','302320,312320]), go through each pair (so i and i+1 elements - For example '230320' (even index) and '230400'(odd index) would be a pair, '25020' and '26232'would be another pair, etc) and put these positions as integers in the variable var.

Code:

lst_separated = [['1605',1607],['230320','230400','25020','26232'],['230320','230400','25020','26232','302320','312320'],...]

populated_indices = [17,430,678...]

outcome = []
for seqName,index in zip(pca_df['seqName'], range(0,len(pca_df))):
    i = 0
    j = 1
    while j < len(lst_separated[populated_indices[index]]):
        pos1 = int(lst_separated[populated_indices[index]][i])
        i += 2
        pos2 = int(lst_separated[populated_indices[index]][j])
        j += 2
        #print(pos1,pos2)
        var = str(record_dict_2[seqName].seq)[pos1-1:pos2-1]
        outcome.append(var)
        
    #print(lst_separated[subindex])


#print(populated_indices)

I can get the positions when each sublist is 2 elements long, but I eventually reach a IndexError: list index out of range in the while loop and I don't think it is looping through every element of each sublist as I want it to (only the first 2 elements even if it has 4 or 6 elements for example). Is there a better way to do this? Possibly with itertools?

Thank you in advance!

  • 1
    What is `pca_df` and how is it related to these lists? Is `populated_indices` at least as long as `pca_df`? – Barmar Nov 25 '21 at 00:13
  • It looks to me like your `zip()` is doing the same thing as `enumerate(pca_df['seqName'])` – Barmar Nov 25 '21 at 00:14
  • Why is `1607` not in quotes, is it expected that `lst_separated ` has both `int` and `str` in its sublists? Will sublists of `lst_separated` always have even numbers of elements? Is `populated_indices` obtained from `pca_df` (and if so, how)? – Grismar Nov 25 '21 at 00:14
  • See https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks for how to group a list into pairs (or chunks of any size). – Barmar Nov 25 '21 at 00:15
  • `pca_df` is the dataframe that contains some sample names and which I used to create the range. And yes, I'm using `zip()` as enumerate, but without the output being a tuple. `1607` not being in quotes was a typo, all elements from the sublists are strings. `populated_indices` were saved using a for loop and if statement beforehand. Bascially, if it is a `NaN` then skip, otherwise, save the index (this is the index that contains the sublists) –  Nov 25 '21 at 00:21
  • I'll try to group the list into pairs as suggested @Barmar, thank you for the input –  Nov 25 '21 at 00:22

1 Answers1

0

Based on the information you gave:

  • The data is well organized (key and value are both adjacent so keys are even and values odd)

I am not solving the problem but here you have an example on how to access to your data in an organized way:

for element in lst_separated:
    print( "keys", element[0::2] )
    print( "values", element[1::2] )
    print("\n")
    print("\t--- now matching key and value is easy ---")
    print("\n")

To match both lists you can take a look to this question.

Results:

keys ['1605']
values ['1607']

    --- now matching key and value is easy ---

keys ['230320', '25020']
values ['230400', '26232']

    --- now matching key and value is easy ---

keys ['230320', '25020', '302320']
values ['230400', '26232', '312320']

    --- now matching key and value is easy ---
emichester
  • 189
  • 9