How can I read a line's charactes from a file into a matrix?

Question

I have a file with sequences like this:

>info
ATG
>info
GA
>info
TTAG
>info
ATTTT

I'd like to read this into a matrix:

matrix[0][0]=A , matrix[0][1]=T, matrix[0][2]=G
matrix[1][0]=G , matrix[1][1]=A
matrix[2][0]=T , matrix[2][1]=T, matrix[2][2]=A , matrix[2][3]=G
ETC...

Is this even possible in Python (pycharm), and if it is, how could I do that?

NEW code so far:

def read(sek):
listA=[]
regex = re.compile(r"[;>](?P<description>[^\n]*)\n(?P<sequence>[^;>]+)")
with open(sek, "r") as file:
     seq = regex.findall(file.read())
     for i, info in enumerate(seq):
        description, sequence = info
        for j < len(sequence):
            listA[i][j]= sequence
            j=j+1
        i=i+1
file.close()
return(listA)
read('sequence1.FASTA')

new error message: SyntaxError: invalid syntax

((original file has description lines, but I already have a solution for that so I didn't wrote it in this question))

It's definitely possible, just provide input and expected output (in a more cohesive sense, i.e. what happens at newlines) and provide what you've tried and where you are stuck. — C.B., Oct 16 '15 at 14:43
I stuck at the matrix part..I can read into a string, but when i tried the matrix format [X][X]=something it wrote out an error message — AmlesLausiv, Oct 16 '15 at 14:58
Possible duplicate of [What to do with "Unexpected indent" in python?](http://stackoverflow.com/questions/1016814/what-to-do-with-unexpected-indent-in-python) — en_Knight, Oct 16 '15 at 15:13
I didn't put in extra spaces, in the real code I only used the spaces the program automatically gave — AmlesLausiv, Oct 16 '15 at 15:18
also I don'tknow how to solve that the matrix count [i][X] should turn into [i+1][X] at every new row — AmlesLausiv, Oct 16 '15 at 15:21

ergonaut · Answer 1 · 2015-10-16T15:18:52.417

0

You can use list:

c = [];
c.append(list("ATG"))
c.append(list("GA"))
c.append(list("TTAG"))
print c[2][1]

You can create the matrix simply like this:

[list(x) for x in open('datafile').read().split("\n")]

>>>> [['A', 'T', 'G'], ['G', 'A'], ['T', 'T', 'A', 'G'], ['A', 'T', 'T', 'T', 'T']]

In your code, the def block needs to be indented, just like while, for, if etc.

edited Oct 16 '15 at 15:18

answered Oct 16 '15 at 15:01

ergonaut

6,929
1
17
47

This is not helpful because the problem he's asking about has nothing to do with the actual parsing, he just has an indenterror. He can post a new question if his actual code has problems – en_Knight Oct 16 '15 at 15:14
ident thing is not the main problem – AmlesLausiv Oct 16 '15 at 15:33

Martin Evans · Answer 2 · 2015-10-16T16:14:23.417

0

The following would load your data from your text file:

def read(sek):
    listA = []
    with open(sek, "r") as file:
        for line1 in file:
            listA.append(list(next(file).strip()))
    return listA

print(read('sequence1.FASTA'))

This would display the following output:

[['A', 'T', 'G'], ['G', 'A'], ['T', 'T', 'A', 'G'], ['A', 'T', 'T', 'T', 'T']]

Or if you prefer to use regular expressions, the following should also work:

def read(sek):
    with open(sek, "r") as file:
        return [list(line) for line in re.findall(r'^([ATGC]+)', file.read(), re.M)]

Note, if the file is huge, the first version avoids loading the whole file into memory at once, but could be slower.

edited Oct 16 '15 at 16:14

answered Oct 16 '15 at 15:05

Martin Evans

45,791
17
81
97

This is not helpful because the problem he's asking about has nothing to do with the actual parsing, he just has an indenterror. He can post a new question if his actual code has problems – en_Knight Oct 16 '15 at 15:14
ident thing is not the main problem – AmlesLausiv Oct 16 '15 at 15:33

score 0 · Answer 3 · answered Oct 16 '15 at 17:10

0

for j < len(sequence):

should be

while j < len(sequence):

To solve your syntax error.

answered Oct 16 '15 at 17:10

C.B.

8,096
5
20
34

unfrtunately it didn't :( – AmlesLausiv Oct 16 '15 at 18:33

How can I read a line's charactes from a file into a matrix?

3 Answers3