I have a file with sequences like this:
>info
ATG
>info
GA
>info
TTAG
>info
ATTTT
I'd like to read this into a matrix:
matrix[0][0]=A , matrix[0][1]=T, matrix[0][2]=G
matrix[1][0]=G , matrix[1][1]=A
matrix[2][0]=T , matrix[2][1]=T, matrix[2][2]=A , matrix[2][3]=G
ETC...
Is this even possible in Python (pycharm), and if it is, how could I do that?
NEW code so far:
def read(sek):
listA=[]
regex = re.compile(r"[;>](?P<description>[^\n]*)\n(?P<sequence>[^;>]+)")
with open(sek, "r") as file:
seq = regex.findall(file.read())
for i, info in enumerate(seq):
description, sequence = info
for j < len(sequence):
listA[i][j]= sequence
j=j+1
i=i+1
file.close()
return(listA)
read('sequence1.FASTA')
new error message: SyntaxError: invalid syntax
((original file has description lines, but I already have a solution for that so I didn't wrote it in this question))