0

I got a big txt file(about 100M) of a list like string as below:

[{'a':'1','b':null}, {'a':'2', 'b':'3'}, {'a':'4', 'b':'5'} ....]

and I want convert this file to a list or panda dataframe. I am using Anaconda and I have read the solution in Convert string representation of list to list, and tried the code below:

import ast
with open('content.txt') as f:
    s = f.read()
l = ast.literal_eval(s)

I first cut a few items from the original file to create a small test case file, when the test case was not big, this code went really well, but once I pass in the whole big file, the Anaconda just went really slow and died. I wonder if there is some way to handle with big file list like string efficiently?

  • This is why I avoid things like Anaconda, there is a lot of overhead. – pstatix Aug 30 '18 at 03:13
  • That isn't a legal Python literal (`null` is not a Python literal), nor is it legal JSON (the property names in JSON must be double-quoted strings, not single-quoted). Where did this come from? The data source seems... problematic, to say the least. – ShadowRanger Aug 30 '18 at 03:15
  • @pstatix: Last I checked, Anaconda is just a Python distribution with a bunch of third party packages installed by default. Those extra packages add some disk space and startup overhead, but shouldn't affect the speed of plain Python code. – ShadowRanger Aug 30 '18 at 03:17
  • @pstatix there isn't any overhead to the Python interpreter that comes with the Anaconda distribution. Sure, it takes more disk-space for the libraries, but if you want those libraries that isn't extra overhead. – juanpa.arrivillaga Aug 30 '18 at 03:19
  • @ShadowRanger: Thanks to your comment and I change the data to JSON format, then pandas.read_json() solves my problem. – T.warlock Sep 01 '18 at 10:06

0 Answers0