4
 import json

 def read_in_chunks(file_object, chunk_size=2048):
  """Lazy function (generator) to read a file piece by piece.
  Default chunk size: 2k."""
  while True:
    data = file_object.read(chunk_size)
    if not data:
        break
    yield data


  f = open('productfeed.json','r')
  for piece in read_in_chunks(f):
      print piece

I have big file of 1gb, so i tried above method to read,it prints the values as string, but not able to read as json as it yields by size of 2k

and i tried this below code

   students_d = json.loads(piece)
   students = students_d["products"] # get list, process with for loop
   for count, student in enumerate(students):
       print(student["product_id"])

getting an error like this "ValueError: Unterminated string starting at: line 1 column 1012 (char 1011) "

Don't know how to proceed further, please someone can help me and not allowed to use ijson

chethi
  • 699
  • 2
  • 7
  • 23
  • You're reading the file in arbitrary chunks. This way you do not have any guarantees regarding where the reading stops, meaning that the json you are trying to load is not complete and valid. – msvalkon Jan 11 '17 at 07:37
  • @msvalkon, thanks, yea, then how to read efficiently? – chethi Jan 11 '17 at 07:42
  • Perhaps http://stackoverflow.com/a/7795029/149076 ... very similar question and a streaming iterative loader. – Jim Dennis Jan 11 '17 at 07:43
  • @JimDennis, not sure how to use it like that, it would be great if you elaborate bit more on how to use – chethi Jan 11 '17 at 07:49
  • If you want to do this properly you'll want to write a json tokenizer which understands the json 'language'. Then you can read a stream byte per byte and build the json using your tokenizer.. You can probably write a naive version of this for a homework assignment, but you'll have to understand tokenization a little bit. – msvalkon Jan 11 '17 at 08:15
  • Possible duplicate of [Opening A large JSON file in Python](https://stackoverflow.com/questions/10715628/opening-a-large-json-file-in-python) – Franck Dernoncourt Feb 12 '18 at 07:42

1 Answers1

0

Use a library for lazy loading (stream parsing): ijson or json-stream.

If you want to write the code on your own, have a look at Python Generators and use them to parse JSON schema.