2

Chunked downloading of files using the google drive API (v3) can be done using the MediaIoBaseDownload method in conjunction with a request object created by request = service.files().get_media(fileId=<id>).

Partial downloading can be done by modifying the Range parameter of the HTTP header, as explained in this post:

request.headers["Range"] = "bytes={}-{}".format(start, start+length)

However, the two cannot be combined, as the byte range information in the header is ignored by MediaIoBaseDownload.

How can a partial download be accomplished in a chunked manner?

user001
  • 1,850
  • 4
  • 27
  • 42
  • 1
    I think that this thread might be able to answer to your question. https://stackoverflow.com/a/59764650 How about this? – Tanaike Jan 20 '20 at 01:36
  • @Tanaike: Yes, thank you, that's most helpful. However, I would like to combine this with `MediaIoBaseDownload` as I need chunked downloading, and I notice from the linked post that the approach is incompatible with that method. Do you know how to perform a partial download in chunks? – user001 Jan 20 '20 at 01:41
  • Thank you for quick reply. I think that `MediaIoBaseDownload` might overwrite the request headers. Because when `MediaIoBaseDownload` is used, the modified headers are not used. About this, it might be required to modify the source of `MediaIoBaseDownload`. I apologize for my incomplete reply. – Tanaike Jan 20 '20 at 01:45
  • @Tanaike: No problem, and thanks again for your help! – user001 Jan 20 '20 at 01:47
  • When I saw [the source code of MediaIoBaseDownload](https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-pysrc.html#MediaIoBaseDownload), it was found that at the method of `next_chunk()`, `headers['range']` is overwritten. I think that the reason of the issue might be this. – Tanaike Jan 20 '20 at 04:35
  • From this situation, I think that when `MediaIoBaseDownload` is used, the partial download has already been run as `self._total_size` of the file size . So in your case, when `self._total_size` is modified, you can control the range of the partial download. But in this case, it is required to directly modify the source code. If this was not useful, I apologize. – Tanaike Jan 20 '20 at 04:35
  • From above result, it is considered that even when the partial download got to be able to be run by modifying `MediaIoBaseDownload`, both `MediaIoBaseDownload` and the custom header with `range` cannot be used. Because `MediaIoBaseDownload` has already been run the partial download with the request header including `range`. Although this might be the current answer, I'm not sure whether this is the result you want. So I posted it as the comments. – Tanaike Jan 20 '20 at 04:41
  • @Tanaike: Thanks. I had checked the source as well and made two changes which should at least allow starting the download from somewhere other than the beginning of the file (useful for resuming an interrupted download). Both changes impact the constructor: [1] the definition of the constructor becomes `def __init__(self, fd, request, chunksize=DEFAULT_CHUNK_SIZE, start=0):`, and [2] the added `start` argument is used to initialize `self._progress = start`. I haven't been able to test whether this works yet. – user001 Jan 20 '20 at 06:33
  • Thank you for replying. I would like to confirm your goal. In your case, I thought that it is not required to use `MediaIoBaseDownload`. But in your goal, you want to use `MediaIoBaseDownload` by modifying. Is my understanding correct? – Tanaike Jan 20 '20 at 08:15
  • @Tanaike: My goal is to partially download a file. Because the part to be downloaded is still very large, I wanted to do it in chunks. I know the API provides `MediaIoBaseDownload` for chunked downloads, but if there is another method that accomplishes the same, I'd be open to using it. By the way, I tested my edits, and they indeed allow partial download from byte `N` to `EOF` by running `MediaIoBaseDownload(fh, request, chunksize=chunksize, start=N)`. This was simple since it required changing only two lines. Modifying to allow an ending byte should be straightforward as well. – user001 Jan 20 '20 at 08:37
  • Thank you for replying. I could understand about your goal. And I'm glad your issue was resolved. – Tanaike Jan 20 '20 at 08:39
  • 1
    @Tanaike: And thanks to you as well. I'm most grateful for your interest and your help. – user001 Jan 20 '20 at 08:42
  • Thank you for your response. I think that your answer is useful. – Tanaike Jan 20 '20 at 23:11
  • 1
    @Tanaike: Thanks, it's only a partial answer for the time being, since I mainly needed a way to resume interrupted downloads of large files. – user001 Jan 20 '20 at 23:30

1 Answers1

1

This is a partial answer, which addresses the start byte of a range, but not the end byte.

As Tanaike pointed out in the comments, MediaIoBaseDownload ignores a user-supplied HTTP Range. A range, specified as follows:

request.headers["Range"] = "bytes={}-{}".format(start, start+length)

actually gets added to self._headers in the MediaIoBaseDownload constructor, but is promptly overwritten, on the first call to the next_chunk method, where headers['range'] is set to 'bytes=%d-%d' % (self._progress, self._progress + self._chunksize). On the first call, self._progress=0 (set by the constructor), so the method will always start a download from the first (zeroth) byte of the file.

There are a few simple ways to change this. We could check whether request.headers['Range'] exists and parse the specified byte positions. Alternatively, we could expose the behavior directly to the caller by adding additional keyword arguments to the constructor for passing starting and ending byte positions.

The following patch (against version 1.7.11 of the googleapiclient) takes the approach of adding a start keyword argument to the MediaIoBaseDownload constructor, so that a download can begin from the Nth byte. If a starting byte is not specified, it will default to downloading from the beginning of the file. Since support for an ending byte position has not been implemented, the download will continue until EOF.

--- googleapiclient/http.py.orig    2019-08-05 12:24:31.000000000 -0700
+++ googleapiclient/http.py 2020-01-19 18:31:56.785404831 -0800
@@ -632,7 +632,7 @@
   """

   @util.positional(3)
-  def __init__(self, fd, request, chunksize=DEFAULT_CHUNK_SIZE):
+  def __init__(self, fd, request, chunksize=DEFAULT_CHUNK_SIZE, start=0):
     """Constructor.

     Args:
@@ -646,7 +646,7 @@
     self._request = request
     self._uri = request.uri
     self._chunksize = chunksize
-    self._progress = 0
+    self._progress = start
     self._total_size = None
     self._done = False
user001
  • 1,850
  • 4
  • 27
  • 42