Unable to Pipe File Read Stream from Google Cloud Storage to Google Drive API

Question

I'm working on a project where I'm reading mind map files created with SimpleMind from Google Drive, modifying the files, and then uploading them back to Google Drive.

The SMMX files created by SimpleMind are zip files which contain XML files and media files.

My program works just fine when I run it locally, the changes I make to the mind map show up in SimpleMind.

I now want to run the program on the Google Cloud Platform using the App Engine.

I can't just write the file I downloaded from Google Drive to the file system of the app server in the cloud because of security restrictions. Instead, I've created a storage bucket to store the file there.

When I do this, however, my file gets corrupted, after I run my program, instead of the zip file contents, it is a JSON file, apparently a string representation of the read stream.

Running Locally – Working

This is a simplified version of my code, without actual modification of the zip file, I've left that out because it's irrelevant for the problem, as well as any error handling – there are never any errors.

When I run the code locally, I use a write stream and a read stream to save and load the file on my local file system:

#!/usr/bin/env node

const { readFileSync, createReadStream, createWriteStream } = require('fs');
const { google } = require('googleapis');

const tokenPath = 'google-drive-token.json';
const clientId = 'xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com';
const redirectUri = 'urn:ietf:wg:oauth:2.0:oob';
const clientSecret = 'xxxxxxxxxxxxxxxxxxxxxxxx';
const fileId = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const fileName = 'deleteme.smmx';

(async () => {
  const auth = new google.auth.OAuth2(clientId, clientSecret, redirectUri);
  const token = JSON.parse(readFileSync(tokenPath));
  auth.setCredentials(token);
  const writeStream = createWriteStream(fileName);
  const drive = google.drive({ version: 'v3', auth });
  let progress = 0;
  const res = await drive.files.get({ fileId, alt: 'media' }, { responseType: 'stream' });
  await new Promise(resolve => {
    res.data.on('data', d => (progress += d.length)).pipe(writeStream);
    writeStream.on('finish', () => {
      console.log(`Done downloading file ${fileName} from Google Drive to local file system (${progress} bytes)`);
      resolve();
    });
  });
  const readStream = createReadStream(fileName);
  progress = 0;
  const media = {
    mimeType: 'application/x-zip',
    body: readStream
      .on('data', d => {
        progress += d.length;
      })
      .on('end', () => console.log(`${progress} bytes read from local file system`))
  };
  await drive.files.update({
    fileId,
    media
  });
  console.log(`File ${fileName} successfully uploaded to Google Drive`);
})();

When I run this script locally, it works fine, the program output is always:

Done downloading file deleteme.smmx from Google Drive to local file system (371 bytes)

371 bytes read from local file system

File deleteme.smmx successfully uploaded to Google Drive

I can run it as many times as I want, new versions of the file are created on Google Drive every time, each is 371 bytes large.

Running in the Google Cloud – Not Working

Here is a version of the script above that I'm using to try and do the same thing, download and upload a file from and to Google Drive, in the Google Cloud, running on the App Engine:

const { readFileSync } = require('fs');
const { google } = require('googleapis');
const { Storage } = require('@google-cloud/storage');

const tokenPath = 'google-drive-token.json';
const clientId = 'xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com';
const redirectUri = 'urn:ietf:wg:oauth:2.0:oob';
const clientSecret = 'xxxxxxxxxxxxxxxxxxxxxxxx';
const fileId = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const fileName = 'deleteme.smmx';
const storageBucketId = 'xxxxxxxxxxx';

module.exports = async () => {
  const auth = new google.auth.OAuth2(clientId, clientSecret, redirectUri);
  const token = JSON.parse(readFileSync(tokenPath));
  auth.setCredentials(token);
  const storage = new Storage();
  const bucket = storage.bucket(storageBucketId);
  const file = bucket.file(fileName);
  const writeStream = file.createWriteStream({ resumable: false });
  const drive = google.drive({ version: 'v3', auth });
  let progress = 0;
  const res = await drive.files.get({ fileId, alt: 'media' }, { responseType: 'stream' });
  await new Promise(resolve => {
    res.data.on('data', d => (progress += d.length)).pipe(writeStream);
    writeStream.on('finish', () => {
      console.log(`Done downloading file ${fileName} from Google Drive to Cloud bucket (${progress} bytes)`);
      resolve();
    });
  });
  const readStream = file.createReadStream();
  progress = 0;
  const media = {
    mimeType: 'application/x-zip',
    body: readStream
      .on('data', d => {
        progress += d.length;
      })
      .on('end', () => console.log(`${progress} bytes read from storage`))
  };
  await drive.files.update({
    fileId,
    media
  });
  console.log(`File ${fileName} successfully uploaded to Google Drive`);
  return 0;
};

The only difference here is that instead of using createWriteStream and createReadStream from the Node.js fs module, I'm using the corresponding methods file.createWriteStream and file.createReadStream from the Google Cloud Storage library.

When I run this code on the App Engine in the Cloud, the first time, everything seems OK, the output is the same as when I run locally:

Done downloading file deleteme.smmx from Google Drive to Cloud bucket (371 bytes)

371 bytes read from storage

File deleteme.smmx successfully uploaded to Google Drive

When I look at the latest version of the file on the Google Drive web frontend, however, it is not my smmx file anymore, but a JSON file, it looks like a string representation of the read stream:

{
  "_readableState": {
    "objectMode": false,
    "highWaterMark": 16384,
    "buffer": { "head": null, "tail": null, "length": 0 },
    "length": 0,
    "pipes": null,
    "pipesCount": 0,
    "flowing": true,
    "ended": false,
    "endEmitted": false,
    "reading": false,
    "sync": false,
    "needReadable": true,
    "emittedReadable": false,
    "readableListening": false,
    "resumeScheduled": true,
    "paused": false,
    "emitClose": true,
    "destroyed": false,
    "defaultEncoding": "utf8",
    "awaitDrain": 0,
    "readingMore": false,
    "decoder": null,
    "encoding": null
  },
  "readable": true,
  "_events": {},
  "_eventsCount": 4,
  "_writableState": {
    "objectMode": false,
    "highWaterMark": 16384,
    "finalCalled": false,
    "needDrain": false,
    "ending": false,
    "ended": false,
    "finished": false,
    "destroyed": false,
    "decodeStrings": true,
    "defaultEncoding": "utf8",
    "length": 0,
    "writing": false,
    "corked": 0,
    "sync": true,
    "bufferProcessing": false,
    "writecb": null,
    "writelen": 0,
    "bufferedRequest": null,
    "lastBufferedRequest": null,
    "pendingcb": 0,
    "prefinished": false,
    "errorEmitted": false,
    "emitClose": true,
    "bufferedRequestCount": 0,
    "corkedRequestsFree": { "next": null, "entry": null }
  },
  "writable": true,
  "allowHalfOpen": true,
  "_transformState": {
    "needTransform": false,
    "transforming": false,
    "writecb": null,
    "writechunk": null,
    "writeencoding": null
  },
  "_destroyed": false
}

It seems that piping a read stream from a cloud storage bucket to a write stream for uploading to Google Drive does not work the way I'd like it to work.

What am I doing wrong? What do I need to change so that my code runs correctly in the Cloud?

If you're interested, the full source code of my project can be found on GitHub.

Update: Workaround

I've found a way to work around this problem:

Read the data from the read stream from the cloud storage bucket into a buffer
Create a readable stream from this buffer as described in this tutorial
pass this “buffer stream” to the drive.files.update method

This way, the zip file on Google Drive does not get corrupted, a new version is stored with the same content as before, as expected.

I find this rather ugly, however. With large mind map files, e.g. ones with many images in it, it will stress the server, since the whole content of the file has to be stored in memory.

I would prefer to make the direct piping from the cloud storage bucket to the Google Drive API work.

Try `sys.pump` approach and see if it helps? https://stackoverflow.com/questions/4589732/what-are-the-pros-and-cons-of-fs-createreadstream-vs-fs-readfile-in-node-js. Also try just using `body: readStream` without any `on` event handler on the stream — Tarun Lalwani, Jul 25 '19 at 12:27
@TarunLalwani thank you for your comments. `sys.pump` or as it was later called `util.pump` has been deprecated since Node.js version 4.0.0 and was removed from subsequent releases. I'm working with Node.js version 10 (LTS). So I think the answer you are linking is outdated. Removing the `on` event handlers doesn't help either, unfortunately. — Patrick Hund, Jul 26 '19 at 06:17
So the json file appears in the first App engine run as well? or from the second time? — Tarun Lalwani, Jul 26 '19 at 08:57
After the script is run by the app engine the first time, the file on Google Drive is replaced by the JSON code — Patrick Hund, Jul 26 '19 at 10:36
@PatrickHund What if you specify the [contentType](https://googleapis.dev/nodejs/storage/latest/global.html#CreateWriteStreamOptions) when creating the write stream? — Razvan, Jul 31 '19 at 19:06

score 3 · Answer 1 · answered May 20 '20 at 08:04

Apparently, you can use a pass-through stream

const file = storage.bucket(bucketName).file(object.name)
const fileStream = file.createReadStream();

const dataStream = new stream.PassThrough();
fileStream.pipe(dataStream);

await uploadFileToGDrive(dataStream, {
   name: object.name,
   mimeType: object.contentType,
   parents: ['shared_dir_in_g_drive'],
})

src: https://github.com/googleapis/google-api-nodejs-client/issues/2015

Unable to Pipe File Read Stream from Google Cloud Storage to Google Drive API

1 Answers1

Linked