1
origCodon = ([orig[i: i + groupSize] for i in range(len(orig) + 1 - groupSize)])
patCodon = ([pat[i: i + groupSize] for i in range(len(pat) + 1 - groupSize)])
print (patCodon)
origCode = []
patCode = []
for p in patCodon:
    for d in dna:
         if d == p:
              x = dna[p]
              print (p)
              patCode.append(x)

The code above takes two lists and splits them into groups of three, but when I go to check each individual element, it makes a new list of three, moving along one element each time.

i.e. this is one list made:

['AAC', 'ACT', 'CTG', 'TGC', 'GCA', 'CAG', 'AGC', 'GCT', 'CTC', 'TCA']

But these are the elements it checks:

AAC
ACT
CTG
TGC
GCA
CAG
AGC
GCT
CTC
TCA

How do I make it so that each group of three is checked and then it moves on to the next?

My list is split into groups of three (becoming items in the list), I want to check each of those items for their corresponding amino acid (in a dictionary), but the program keeps making new lists, e.g. the user enters AAATTT, then the program checks:

AAA
AAT
ATT
TTT

rather than just AAA and TTT

Hayley van Waas
  • 429
  • 3
  • 12
  • 21

3 Answers3

6

There are two ways to do this: slices, or a shared iterator.

The other answers show the slice method—which I think you could have gotten correct, if you just knew/remembered the step=3 to range:

[lst[i:i+3] for i in range(0, len(lst), 3)]

The only major downside of this method is that it only works on a list or other sequence, not a general iterable. In your current code, this doesn't matter, because the thing you want to call it on is a list.

But it's worth knowing the alternative too:

i = iter(list)
zip(i, i, i)

iter just asks a sequence or other iterable for a single-pass iterator over its contents.

Then zip just advances them in lockstep, as usual.

Because all three of zip's arguments are references to the exact same iterator, when it tries to advance one, it advances all of them. (This is why we can't just do zip(iter(i), iter(i), iter(i))—then you'd have three separate iterators.)


But what if you want to group by 2, or 5? Writing separate functions for zip(i, i) and zip(i, i, i, i, i) and so on wouldn't be very nice.

If we had a sequence of n references of the iterator, we could use *args syntax, as described in the tutorial under Unpacking Argument Lists, to just call zip(*sequence).

And we can easily get such a sequence by using the * repetition operator: [i]*n. (If you don't understand why that ends up with n references to one iterator, instead of n separate iterators, read the Python FAQ's entry on How do I create a multidimensional list?.)

And you can put that all together into a one-liner:

zip(*[iter(lst)]*n)

If there's a partial group left over, this will drop it, because that's what zip does. So if you'd rather do something different in that case, you can just replace zip with a different function—e.g., to pad the partial group with spaces, just:

itertools.zip_longest(*[iter(lst)]*3, fillvalue=' ')

The itertools recipes in the docs have a function caller grouper which does this for you.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Where would I look for documentation on the syntax you use in `zip(*[iter(lst)]*3)` ? Unclear on what the asterisks do and not sure where to find an explanation. – Mark R. Wilkins Aug 30 '13 at 21:19
  • @MarkR.Wilkins: Reorganized the answer and added links, in a way that should help you find the explanations. – abarnert Aug 30 '13 at 21:40
3

With a nod to Óscar, who figured out the bulk of the problem, I think the OP is asking about something like this:

codon = 'AACTGCAGCTCA'

list = [codon[i:i+3] for i in range(0, len(codon), 3)]

=> ['AAC', 'TGC', 'AGC', 'TCA']

The list ['AAC', 'ACT', 'CTG', 'TGC', 'GCA', 'CAG', 'AGC', 'GCT', 'CTC', 'TCA'] was an unintended result of the OP's code, because each triplet contains the last two characters of the previous one.

Edit: Also, this chunk of code:

for p in patCodon:
    for d in dna:
        if d == p:
             x = dna[p]
             print (p)
             patCode.append(x)

should probably be this instead:

for p in patCodon:
    if p in dna:
        x = dna[p]
        print (p)
        patCode.append(p)

The reason is that checking for membership with in is much faster than looping over the members.

This will only work if dna is a dict. If dna is a list, the same syntax will work to check whether p is in dna, but x = dna[p] is probably a mistake.

Mark R. Wilkins
  • 1,282
  • 7
  • 15
1

You mean, like this?

lst = ['AAC', 'ACT', 'CTG', 'TGC', 'GCA', 'CAG', 'AGC', 'GCT', 'CTC', 'TCA']
[lst[i:i+3] for i in range(0, len(lst), 3)]

=> [['AAC', 'ACT', 'CTG'], ['TGC', 'GCA', 'CAG'], ['AGC', 'GCT', 'CTC'], ['TCA']]

The above will iterate over the original list and create sublists of at most three elements - notice that the last sublist can have 1, 2 or 3 elements.

Óscar López
  • 232,561
  • 37
  • 312
  • 386