0

I have a file containing a really big list of matrices (i.e. lists of lists of integers), which I want to load into the python shell. The file content has the form

L = [ [[1,2],[3,4]], [[5,6],[7,8]], ... ]

so I tried to load it via "execfile(filename)". Unfortunately, I am running out of memory in that way. What am I doing wrong?

For comparison: The file size is about 2GB, while the machine has 100GB of memory. The matrices are of dimension like 1000x1000.

Dune
  • 293
  • 1
  • 10

1 Answers1

1

My attempt using ast.literal_eval. If it doesn't work, I'll delete my answer but I think it's worth a shot:

import ast

with open("bigfile.txt") as f:
    while True:
        c = f.read(1)
        if not c:
            break
        if c=='=':
            # equals found, skip spaces if any
            while f.read(1)==" ":
                pass
            break

    # rewind to sync with non-whitespace char that we have consumed
    f.seek(f.tell()-1)

    L = ast.literal_eval(f.read())

basically, open the file, read char by char to skip the assignment (literal_eval doesn't evaluate assignments, only structures, a bit like json) and feed the rest of the huge file to the literal evaluator.

Since it's another mean of doing it, it may work, and as a bonus it's much safer than using exec or eval.

EDIT: since your comment stated that it still took a lot of memory, I suggest that you write data line by line so ast.literal_eval can evaluate each line as a vector, and you can put it in your matrix.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Thanks a lot for your try! But unfortunately, it also consumes a huge amount of memory (80GB by now), and will surely run out of memory soon... – Dune Dec 29 '16 at 09:15
  • Since I am producing the file by myself (using another programming language), maybe I should give it another format? Would it help to put each matrix in a different line so that we can read (and parse) the file line by line? – Dune Dec 29 '16 at 09:17
  • yes, you could just dump the dimensions, then the values, line by line, without any brackets, or with brackets but line by line so `ast.literal_eval` can evaluate each line as a vector, and you can put it in your matrix. – Jean-François Fabre Dec 29 '16 at 09:48
  • It worked! Now I've put each matrix in a separate line and applied ast.literal_eval to each. This only takes 5GB of memory which is still a little bit strange but well enough. – Dune Jan 04 '17 at 13:05
  • excellent. I have edited my answer with the thing you tried and which worked for completeness. – Jean-François Fabre Jan 04 '17 at 13:09