0

I have a list that I called lst, it is as follows:

lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']

What i want to know is how to split this up into four letter strings which start with the first, second, third, and fourth letters; then move to the second, third, fourth and fifth letters and so on and then add it to a new list to be compared to a main list.

Thanks

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
shamp113
  • 11
  • 3
  • 1
    Do you know how to make the first four letter string? – Ted Brownlow Aug 25 '19 at 00:32
  • 2
    It sounds like you're looking for a "sliding/rolling window", and might be able to achieve the first part of what you're looking for from here: https://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator . I'm not sure what is meant by needing to compare the new list to the old list. – Cameron Yick Aug 25 '19 at 00:33
  • Welcome to Stack Overflow. Do you mean you want the output `["ACTG", "CTGA", "TGAC", "GACG", "ACGC", "CGCA", "GCAG"]` from your example input? Will you always want four-letter strings, or could the desired strings have a different length? Also, what have you tried, and just where are you stuck? – Rory Daulton Aug 25 '19 at 00:34
  • @RoryDaulton yes that's what i am trying to get as my output. I am new to Python and basically trying to learn on the fly. I really am not sure what to try. – shamp113 Aug 25 '19 at 00:37
  • Welcome to SO. Please take the [tour] and take the time to read [ask] and the other links found on that page. This isn't a discussion forum or a Tutorial. ... https://docs.python.org/3/tutorial/index.html – wwii Aug 25 '19 at 01:31
  • [Why “Can someone help me?” is not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question) – wwii Aug 25 '19 at 01:32

4 Answers4

0

To get the first sublist, use lst[0:4]. Use python's join function to merge it into a single string. Use a for loop to get all the sublists.

sequences = []
sequence_size = 4
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']

for i in range(len(lst) - sequence_size + 1):
    sequence = ''.join(lst[i : i + sequence_size])
    sequences.append(sequence)

print(sequences)
Irfan434
  • 1,463
  • 14
  • 19
0

All 4-grams (without padding):

# window size:
ws = 4
lst2 = [
    ''.join(lst[i:i+ws])
    for i in range(0, len(lst))
    if len(lst[i:i+ws]) == 4
]

Non-overlapping 4-grams:

lst3 = [
    ''.join(lst[i:i+ws])
    for i in range(0, len(lst), ws)
    if len(lst[i:i+ws]) == 4
]
Mehdi
  • 4,202
  • 5
  • 20
  • 36
0

Use:

lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
i=0
New_list=[]
while i<(len(lst)-3):
    New_list.append(lst[i]+lst[i+1]+lst[i+2]+lst[i+3])
    i+=1
print(New_list)

Output:

['ACTG', 'CTGA', 'TGAC', 'GACG', 'ACGC', 'CGCA', 'GCAG']
ansev
  • 30,322
  • 5
  • 17
  • 31
0

I think the other answers solve your problem, but if you are looking for a pythonic way to do this, I used List comprehension. It is very recommended to use this for code simplicity, although sometimes diminish code readability. Also it is quite shorter.

lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
result = [''.join(lst[i:i+4]) for i in range(len(lst)-3)]
print(result)
GusSL
  • 652
  • 7
  • 23