0

I currently have a large file (~65mb) with a list like so in it;

[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]]...]

Python thinks that the above is a string, I tried eval() and ast.literal_eval() which both result in a memoryerror.

I then have turned it into a dictionary {mlst: list()} which loads it quicker but python still sees this as a string.

Is there a simple way to get python to recognise that the string is a list? Or is it a case of building a JSON file or a more easily readable format for python?

Thanks!

pault
  • 41,343
  • 15
  • 107
  • 149
addyal
  • 3
  • 1

1 Answers1

3

If I create a file with:

x=[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]]]

that is, your string with a simple assignment appended.

In [1]: import stack48648178
In [3]: stack48648178.x
Out[3]: 
[[[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]]]

In effect this does an eval on the file text.

I'm not going to generate and test a large file. But if you are getting a memory error it could be that the resulting list is just too big for your RAM. With that degree of nesting, the list structure (with all the pointers) could get quite large. Numbers this small have unique ids, but you have several layers of pointers to get to each one.

There may be some sort intermediate lists involved in parsing the string that boost the memory requirement, but that would hard to track down.

The memory requirement for an numpy array will be less, but you still have to go through the list parsing to create it:

In [5]: X = np.array(stack48648178.x)
In [6]: X.shape
Out[6]: (3, 3, 3)
In [7]: np.size(X)
Out[7]: 27

With the string memory:

In [8]: txt='[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,
   ...: 2],[0,0,3]]]'
In [9]: txt
Out[9]: '[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]]]'
In [10]: eval(txt)
Out[10]: 
[[[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]]]
In [11]: import json
In [12]: json.loads(txt)
Out[12]: 
[[[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]]]

eval and json do the same thing, resulting in the same nested list. Again I can't say how much memory they use during processing, but I suspect it's about the same.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks for this - it was a combination of a memory error and not using numpy for the issue. – addyal Feb 07 '18 at 08:22