Changing a large string to a list

Question

I currently have a large file (~65mb) with a list like so in it;

[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]]...]

Python thinks that the above is a string, I tried eval() and ast.literal_eval() which both result in a memoryerror.

I then have turned it into a dictionary {mlst: list()} which loads it quicker but python still sees this as a string.

Is there a simple way to get python to recognise that the string is a list? Or is it a case of building a JSON file or a more easily readable format for python?

Thanks!

is your list all on one line? can you have the input format changed? — Jean-François Fabre, Feb 06 '18 at 16:59
See there : "Convert string representation of list to list " https://stackoverflow.com/questions/1894269/convert-string-representation-of-list-to-list-in-python — Sebastien D, Feb 06 '18 at 17:01
Have you tried `import json` and then `data = json.loads(open('path/to/file', 'r').read())`? — pault, Feb 06 '18 at 17:02
@addyal - please can you share the code you're using to load the list? — Rich, Feb 06 '18 at 17:11
Could you use str.split(), and just turn it into a list, instead? — j-grimwood, Feb 06 '18 at 17:32

hpaulj · Accepted Answer · 2018-02-06T17:41:12.230

If I create a file with:

x=[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]]]

that is, your string with a simple assignment appended.

In [1]: import stack48648178
In [3]: stack48648178.x
Out[3]: 
[[[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]]]

In effect this does an eval on the file text.

I'm not going to generate and test a large file. But if you are getting a memory error it could be that the resulting list is just too big for your RAM. With that degree of nesting, the list structure (with all the pointers) could get quite large. Numbers this small have unique ids, but you have several layers of pointers to get to each one.

There may be some sort intermediate lists involved in parsing the string that boost the memory requirement, but that would hard to track down.

The memory requirement for an numpy array will be less, but you still have to go through the list parsing to create it:

In [5]: X = np.array(stack48648178.x)
In [6]: X.shape
Out[6]: (3, 3, 3)
In [7]: np.size(X)
Out[7]: 27

With the string memory:

In [8]: txt='[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,
   ...: 2],[0,0,3]]]'
In [9]: txt
Out[9]: '[[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]],[[0,0,1],[0,0,2],[0,0,3]]]'
In [10]: eval(txt)
Out[10]: 
[[[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]]]
In [11]: import json
In [12]: json.loads(txt)
Out[12]: 
[[[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]],
 [[0, 0, 1], [0, 0, 2], [0, 0, 3]]]

eval and json do the same thing, resulting in the same nested list. Again I can't say how much memory they use during processing, but I suspect it's about the same.

Thanks for this - it was a combination of a memory error and not using numpy for the issue. — addyal, Feb 07 '18 at 08:22

Changing a large string to a list

1 Answers1