0

I am trying to transpose a huge tab delimited file with about 6000 rows and 2 million columns. The preferable approach should not involving holding the whole file in memory, which seems to be what the answer in this question does:

How to do row-to-column transposition of data in csv table?

Community
  • 1
  • 1
qed
  • 22,298
  • 21
  • 125
  • 196
  • Are the columns fixed width, or do they all have different widths? – Sven Marnach Jun 18 '13 at 10:20
  • Unfortunately the first two columns are different from the others, they are text strings with different widths, but the other columns are all numbers with fixed widths. – qed Jun 18 '13 at 10:55
  • But these two columns are not of much importance and can be removed if necessary. – qed Jun 18 '13 at 11:10
  • I just left an answer to a question identical to your here: http://stackoverflow.com/questions/7156539/how-do-i-transpose-pivot-a-csv-file-with-python-without-loading-the-whole-file/26122437#26122437 – tommy.carstensen Sep 30 '14 at 13:46

1 Answers1

0

One approach would be to iterate over the input file once for every column (untested code!):

with open("input") as f, open("output", "w") as g:
    try:
        for column_index in itertools.count():
            f.seek(0)
            col = [line.split("\t")[column_index] for line in f]
            g.write("\t".join(col) + "\n")
    except IndexError:
        pass

This is going to be very slow, but only keeps a single line at a time in memory.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841