0

I need to read a really big file of jsonl's from a URL the approach I am using is as follow

 bulk_status_info = _get_bulk_info(shop)
 url = bulk_status_info.get('bulk_info').get('url')
 file = urllib.request.urlopen(url)
 for line in file:
    print(json.loads(line.decode("utf-8")))

However, my CPU and memory are limited so that brings me to two questions

  1. Is the file loaded all at once or is it have some buffering mechanism to prevent memory from overflowing.
  2. In case my task failed I want to start from the place I failed. Is there some sort of cursor I can save. Note things like seek or tell do not work here since it is not an actual file

Some additional info I am using Python3 and urllib

urag
  • 1,228
  • 9
  • 28
  • Whether it is all in memory or buffered will probably depend on the response headers. There is an HTTP header to specify a Streaming response. If it's not present, then I'm not sure urllib would buffer it. – saquintes Jun 23 '21 at 12:53
  • https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests – jhylands Jun 23 '21 at 13:06

1 Answers1

0

The file will be loaded in its entirety before running the for loop. The file will be loaded packet by packet but this is abstracted away by urllib. If you want to have closer access I'm sure there is a way similar to how it can be done using the requests library.

Generally there is no way to resume the downloading of a webpage, or any file request for that matter unless the server specifically supports it. That would require the server to allow for a start point to be specified, this is the case for video streaming protocols.

jhylands
  • 984
  • 8
  • 16