General problem: I try to transpose a large numpy matrix using matrix.T. It is working well when using a small test file. However, when using the big file only the first 3 and the last 3 lines are transposed but the lines in between (in total ~250,000) are not transposed and are print as '...'. In addition, only the first and last 3 nucleotides per line are displayed. Finally, it looks like that:
[['C' 'T' 'C' ..., 'A' 'C' 'T']
['C' 'T' 'A' ..., 'A' 'T' 'G']
['C' 'T' 'A' ..., 'G' 'C' 'A']
...,
['T' 'A' 'A' ..., 'G' 'A' 'T']
['T' 'A' 'A' ..., 'C' 'G' 'T']
['C' 'G' 'T' ..., 'A' 'A' 'G']]
This is my code:
import numpy as np
with open("temp1.txt","rt") as infile:
matrix = np.matrix([list(line.strip()) for line in infile.readlines()])
x = matrix.T
file_temp2.write(str(x))
Explanation: 1. The temp1.txt includes ~ 250,000 DNA sequences with a length of 100 nucleotides (A, C, T and G). The lines are separated with "\n" after the 100 nucleotides. The first lines look like that:
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAATCTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTC TTTATGTTTGGACATTTATTGTCATTCTTACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTGTAGGGATGAAG CAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCGTAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAAT AAAAAAGTTGTAATTATTAATGATAGTTCTGTGATTCCTCCATGAATCACATCTGCTTGATTTTTCTTTCATAAATTTATAAGTAATACATTCTTATAAA TATATGGAAGATGTGAATGAAGTTTTGGTCCTGAATGTGGCCAAGGTTCCGTCATTTGGAGATACGAAATCAAATCTCCTTTAAGATTTTGTTTTTATAA
and so on
2. The temp1.txt is converted into the numpy matrix and finally transposed, which works fine using a test-file (containing only 10 sequences). However, in the big file the above mentioned general problem occurs when transposing.
?Solution?: Do you have an idea how to get the complete transposed matrix of the big file to be finally write into my temp2.txt for further analysis.
!!!Solution found: Finally, I found that I have to convert the matrix into a list before saving. I have to do y = np.array(x)[0:].tolist() first before writing into the file. Now it is working. The code now is:
import numpy as np
with open("temp1.txt","rt") as infile:
matrix = np.matrix([list(line.strip()) for line in infile.readlines()])
x = matrix.T
y = np.array(x)[0:].tolist()
z = str(y).replace("], [", "\n")
file_temp2.write(str(z))