I am trying to transpose a huge tab delimited file with about 6000 rows and 2 million columns. The preferable approach should not involving holding the whole file in memory, which seems to be what the answer in this question does:
Asked
Active
Viewed 1,070 times
0
-
Are the columns fixed width, or do they all have different widths? – Sven Marnach Jun 18 '13 at 10:20
-
Unfortunately the first two columns are different from the others, they are text strings with different widths, but the other columns are all numbers with fixed widths. – qed Jun 18 '13 at 10:55
-
But these two columns are not of much importance and can be removed if necessary. – qed Jun 18 '13 at 11:10
-
I just left an answer to a question identical to your here: http://stackoverflow.com/questions/7156539/how-do-i-transpose-pivot-a-csv-file-with-python-without-loading-the-whole-file/26122437#26122437 – tommy.carstensen Sep 30 '14 at 13:46
1 Answers
0
One approach would be to iterate over the input file once for every column (untested code!):
with open("input") as f, open("output", "w") as g:
try:
for column_index in itertools.count():
f.seek(0)
col = [line.split("\t")[column_index] for line in f]
g.write("\t".join(col) + "\n")
except IndexError:
pass
This is going to be very slow, but only keeps a single line at a time in memory.

Sven Marnach
- 574,206
- 118
- 941
- 841