Goal
Download and upload a file to Google Drive purely in-memory using Google Drive APIs Resumable URL.
Challenge / Problem
I want to buffer the file as its being downloaded to memory (not filesystem) and subsequently upload to Google Drive. Google Drive API requires chunks to be a minimum length of
256 * 1024, (262144 bytes)
.
The process should pass a chunk from the buffer to be uploaded. If the chunk errors, that buffer chunk is retried up to 3 times. If the chunk succeeds, that chunk from the buffer should be cleared, and the process should continue until complete.
Background Efforts / Research (references below)
Most of the articles, examples and packages I've researched and tested have given some insight into streaming, piping and chunking, but use the filesystem
as the starting point from a readable stream.
I've tried different approaches with streams like passthrough
with highWaterMark
and third-party libraries such as request
, gaxios
, and got
which have built in stream/piping support but with no avail on the upload end of the processes.
Meaning, I am not sure how to structure the piping
or chunking
mechanism, whether with a buffer
or pipeline
to properly flow to the upload process until completion, and handle the progress and finalizing events in an efficient manner.
Questions
With the code below, how do I appropriately buffer the file and
PUT
to the google provided URL with the correctContent-Length
andContent-Range
headers, while having enough buffer space to handle 3 retries?In terms of handling back-pressure or buffering, is leveraging
.cork()
and.uncork()
an efficient way to manage the buffer flow?Is there a way to use a
Transform
stream withhighWaterMark
andpipeline
to manage the buffer efficiently? e.g...
pipeline(
downloadStream,
transformStream,
uploadStream,
(err) => {
if (err) {
reject(err)
} else {
resolve(true)
}
}
)
Below is a visual model and the code of what I'm trying to accomplish:
Visual Example
[====================]
File Length (20 MB)
[========== ]
Download (10 MB)
[====== ]
Buffer (e.g. 6 MB, size 12 MB)
[===]
Upload Chunk (3 MB) => Error? Retry from Buffer (max 3 times)
=> Success? Empty Buffer => Continue =>
[===]
Upload next Chunk (3 MB)
Code
/*
Assume resumable_drive_url was already obtained from Google API
with the proper access token, which already contains the
Content-Type and Content-Length in the session.
*/
transfer(download_url, resumable_drive_url, file_type, file_length) {
return new Promise((resolve, reject) => {
let timeout = setTimeout(() => {
reject(new Error("Transfer timed out."))
}, 80000)
// Question #1: Should the passthrough stream
// and .on events be declared here?
const passthrough = new stream.PassThrough({
highWaterMark: 256 * 1024
})
passthrough.on("error", (error) => {
console.error(`Upload failed: ${error.message}`)
reject(error.message)
})
passthrough.on("end", () => {
clearTimeout(timeout)
resolve(true)
})
// Download file
axios({
method: 'get',
url: download_url,
responseType: 'stream',
maxRedirects: 1
}).then(result => {
// QUESTION #2: How do we buffer the file from here
// via axios.put to the resumable_url with the correct
// header information Content-Range and Content-Length?
// CURIOSITY #1: Do we pipe from here
// to a passthrough stream that maintains a minimum buffer size?
result.data.pipe(passthrough)
}
).catch(error => {
reject(error)
})
})
}
References
- Chunked Upload Class - (Decent chunking mechanism but bloated; seems there is a more efficient approach with stream piping)
- Google Drive API v3 - Upload via Resumable URL with Multiple Requests
- resumableUpload.js - (Conceptually right but uses File System)
- Google-Drive-Uploader - (Conceptually right but uses File System and custom StreamFactory)
- Resumable upload in Drive Rest API V3 - (Decent but seems bloated and antiquated)