1

I want to read a sequence Amino Acid Sequence ("ACDEFGHIKL") in a pair of predefined motif length lets say 3 and print it. output will be [ACD,EFG,HIK]. but next time I want to increase its base by 1 so next output should be [CDE,FGH,IKL].

I wrote the following python code which works absolutely fine. I just want to explore if there is any other option to write it to make it simple.

motif_len=int(motif_len)

if len(AA_seq)>=motif_len:
    for i in range(len(AA_seq)-motif_len+1):        
        
        a=i
        b=i+motif_len
        # print(a,b)
        print(AA_seq[a:b])

Any comment or suggestion will be appreciated. I was wondering if Python has any prebuild library for this kind of function. Thanks

shivam
  • 596
  • 2
  • 9
  • 1
    Why doyou cast motif_len to an integer? it is not already an integer? Why do you assign a and b instead of using directly the operation? – Fran Arenas Mar 31 '22 at 06:23
  • Actually I am passing the value of motif_len through command which by default passes as a string so I have to convert it into integer to perform mathematical operations. – shivam Mar 31 '22 at 06:38

3 Answers3

0

You can use the regex library to get the list of blocks:

import re
re.findall('...','ACDEFGHIKL')

Another option is the textwrap library:

from textwrap import wrap
wrap('ACDEFGHIKL', 3)

To complete, iterate through the substrings:

s_cur = s

for i in range(len(s)):
    print(get_blocks(s_cur))
    s_cur = s_cur[1:]

Where get_blocks is a function that uses one of the two methods above.

joaopfg
  • 1,227
  • 2
  • 9
  • 18
0

If I've understood correctly, you are implementing what is known as a 'sliding window' or 'rolling window'.

I was looking into this recently and came across the following thread:

Rolling or sliding window iterator?

This article may also be of interest:

https://medium.com/geekculture/implement-a-sliding-window-using-python-31d1481842a7

My conclusion was that there's no obvious inbuilt function to call for this one and that the simplest implementation is basically the one you have already worked out for yourself!

ljdyer
  • 1,946
  • 1
  • 3
  • 11
0

i would go this way -

group_len = 4
AA_seq = "ACDEFGHIKL"
print([AA_seq[i: i+group_len] for i in range(len(AA_seq) - group_len + 1) if len(AA_seq) >= group_len ])

which for this specific case would result in:

['ACDE', 'CDEF', 'DEFG', 'EFGH', 'FGHI', 'GHIK', 'HIKL']

gil
  • 2,388
  • 1
  • 21
  • 29