1

How to parse a big file using ast.literal_eval without causing MemoryError? For example, the file I want to parse is 41MB.

I watched the process, and found that python took more than 3G memory. I'm using a 32-bit system, so it's up to the process's max memory.

Why does ast.literal_eval take so much memory as it only parses to get the data structure? Is there any way to reduce the memory usage?

By the way, the code is:

import ast
file = open(file_name, 'r')
data = ast.literal_eval(file.read())
file.close()

the exception is

  File "/usr/local/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/local/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
MemoryError

Thanks!

writalnaie
  • 180
  • 10
  • Regardless of what's going on, probably the right answer is going to be not to use `ast.literal_eval`. What does the file look like? – DSM Jan 04 '14 at 08:34
  • try, json or cjson if it is json format – YOU Jan 04 '14 at 08:39
  • 3
    You can read the file in chunks: Read this: [link](http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) – sudip Jan 04 '14 at 08:44
  • 1
    @iamsudip: How would this make a difference? As far as I know `ast.literal_eval()` needs the entire data structure at once. – Tim Jan 04 '14 at 08:59
  • @DSM It's mixed of tuple, dict, string and integer, in a mess. I use the data structure in runtime. But when the process shutdown, it gets lost. So I want to dump the data structure to disk in a easy-to-load way. Since python provides the ast.literal_eval, I think it's a good choice. – writalnaie Jan 07 '14 at 14:52
  • @YOU: I directly use the data structure in python format. Seems json provides interface converting between python's tuple, list, dict, string and number. em.. I can have a try. Not sure if json can also read it. If so, I am curious why python's ast.literal_eval can not do it as they both do almost the same job. – writalnaie Jan 07 '14 at 14:58
  • @writalnaie: but that's not really what `literal_eval` is for, and it doesn't surprise me a lot that it's having problems. Look at `ast.literal_eval('('*100+'3' + ')'*100)`, for example; only 201 characters and you get a `MemoryError`. Write your structures in JSON instead-- not only should it work, it should be faster than `ast.literal_eval` would have been. – DSM Jan 07 '14 at 16:13

1 Answers1

1

I have been facing the same problem in my project. I came across two solutions which I think might help future users.

1) Depending on the data structure you might want to use database like Redis ( this is the one which I am using based on good reviews though there are other databases too). Redis has a good python extension: Redis-py. It is easy to use. Before you start storing your data to the database you might want to look for the querying process that you will be using.

Here is the installation guide for redis server: http://redis.io/topics/quickstart (There are lots of blogs too for usage)

Here is how you can use redis from Python: http://redis-py.readthedocs.org/en/latest/

2) if the data structure is dictionary, you can use

json.loads() 

which is easy and memory efficient. Even if the data structure is not a dictionary you can make it a dictionary with arbitrary key and once the

d = json.loads() 

is executed you can get your data structure by calling

d['arbitrary_key']

Personally, I like to use Redis since I dont have to waste time in loading the file. All I need to do is to start the server and query the database.

Alex
  • 12,078
  • 6
  • 64
  • 74
pg2455
  • 5,039
  • 14
  • 51
  • 78