0

I need to write a python script that takes an existing file split into columns and rows, reads in each column, and outputs each column as a row in the output file. The file is a matrices of numbers, and the process is transposing and outputting that matrices. The issue is that my matrices file is so huge that I literally can not hold the entire thing in memory. Attempting to do so crashes with a memory error. Every solution I've found so far either requires you to grab the entire infile at once, read through every row of the infile over and over again just grabbing one number at a time, or reading through the infile once, but parsing through the outfile over and over again to append each row with the next number.

Example input:

1,2,3

4,5,6

7,8,9

example output:

1,4,7

2,5,8

3,6,9

Additional info: The file is in plaintext. The delimiter can be a comma, a space, or a tab depending. The matrices is not square.

Edit: Final solution.

Unfortunately, it seems the task I wished to do could not be done the way I wanted. Due to tight memory constraints, there wasn't much that could be done outside of either looping through the infile or out file multiple times. So the final solution is to just read the infile over and over to construct each outfile row, outputting the row, and repeating for the next one.

  • How is your file delimited? If it is a csv file, have you tried https://docs.python.org/2/library/csv.html ? – derricw Oct 28 '14 at 21:36
  • 2
    you __cannot__ (at least not without alot of difficulty) a file must be read sequentially or you have to manually seek to positions to read .... – Joran Beasley Oct 28 '14 at 21:37
  • Some are comma delimited, but some are also space or tab delimited. It's inconsistent. – user3712957 Oct 28 '14 at 21:42
  • Is it a square matrix? – kevinsa5 Oct 28 '14 at 21:45
  • It is not. It's an N x M matrix. – user3712957 Oct 28 '14 at 21:48
  • 2
    If your solution is taking a long time, it might be useful to pre-process the input file to make the delimiters and formatting clean. Then you can have more efficient code that processes from rows to columns. But I see no way around scanning the input file multiple times. – Fred S Oct 28 '14 at 23:08
  • It would seem everyone agrees with you there. I've taken the pre-processor steps. I've just resigned to scanning the input multiple times. It functions now. It just takes a while to do so many loops through the infile. – user3712957 Oct 31 '14 at 17:08
  • So you are just transposing the matrix? Just read in line-by-line and zip the lines together. – Mr. Polywhirl Oct 31 '14 at 17:15

1 Answers1

0

You can get around your memory constraints by using a binary file:

  1. Determine the type of numbers you have in the file (can they be represented by an unsigned integer? signed integer? long? float? double?) , how long each line is, and the number of lines.
  2. Read your ASCII file in one line at a time, for each line:
    1. parse the text into a list of numbers
    2. write the numbers to a binary file
  3. Build each line of the output ASCII file by:
    1. go to the next value to be output with.
    2. Writing formatted text to your transposed ASCII file.
    3. Don't forget the newline at the end of your column that has become a row... I always forget the silly '\n' the first time.

This answer will help you write to binary files

This answer will help you read binary files. You'll have to modify the code in this answer to read only the bytes you need for your next value, and not the entire file. Take a look at the documentation for the read() and seek() functions of the file object to figure that out.

Good luck, and happy coding.

Community
  • 1
  • 1
Mark
  • 166
  • 5