I'm working on a project where I'm reading mind map files created with SimpleMind from Google Drive, modifying the files, and then uploading them back to Google Drive.
The SMMX files created by SimpleMind are zip files which contain XML files and media files.
My program works just fine when I run it locally, the changes I make to the mind map show up in SimpleMind.
I now want to run the program on the Google Cloud Platform using the App Engine.
I can't just write the file I downloaded from Google Drive to the file system of the app server in the cloud because of security restrictions. Instead, I've created a storage bucket to store the file there.
When I do this, however, my file gets corrupted, after I run my program, instead of the zip file contents, it is a JSON file, apparently a string representation of the read stream.
Running Locally – Working
This is a simplified version of my code, without actual modification of the zip file, I've left that out because it's irrelevant for the problem, as well as any error handling – there are never any errors.
When I run the code locally, I use a write stream and a read stream to save and load the file on my local file system:
#!/usr/bin/env node
const { readFileSync, createReadStream, createWriteStream } = require('fs');
const { google } = require('googleapis');
const tokenPath = 'google-drive-token.json';
const clientId = 'xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com';
const redirectUri = 'urn:ietf:wg:oauth:2.0:oob';
const clientSecret = 'xxxxxxxxxxxxxxxxxxxxxxxx';
const fileId = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const fileName = 'deleteme.smmx';
(async () => {
const auth = new google.auth.OAuth2(clientId, clientSecret, redirectUri);
const token = JSON.parse(readFileSync(tokenPath));
auth.setCredentials(token);
const writeStream = createWriteStream(fileName);
const drive = google.drive({ version: 'v3', auth });
let progress = 0;
const res = await drive.files.get({ fileId, alt: 'media' }, { responseType: 'stream' });
await new Promise(resolve => {
res.data.on('data', d => (progress += d.length)).pipe(writeStream);
writeStream.on('finish', () => {
console.log(`Done downloading file ${fileName} from Google Drive to local file system (${progress} bytes)`);
resolve();
});
});
const readStream = createReadStream(fileName);
progress = 0;
const media = {
mimeType: 'application/x-zip',
body: readStream
.on('data', d => {
progress += d.length;
})
.on('end', () => console.log(`${progress} bytes read from local file system`))
};
await drive.files.update({
fileId,
media
});
console.log(`File ${fileName} successfully uploaded to Google Drive`);
})();
When I run this script locally, it works fine, the program output is always:
Done downloading file deleteme.smmx from Google Drive to local file system (371 bytes)
371 bytes read from local file system
File deleteme.smmx successfully uploaded to Google Drive
I can run it as many times as I want, new versions of the file are created on Google Drive every time, each is 371 bytes large.
Running in the Google Cloud – Not Working
Here is a version of the script above that I'm using to try and do the same thing, download and upload a file from and to Google Drive, in the Google Cloud, running on the App Engine:
const { readFileSync } = require('fs');
const { google } = require('googleapis');
const { Storage } = require('@google-cloud/storage');
const tokenPath = 'google-drive-token.json';
const clientId = 'xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com';
const redirectUri = 'urn:ietf:wg:oauth:2.0:oob';
const clientSecret = 'xxxxxxxxxxxxxxxxxxxxxxxx';
const fileId = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
const fileName = 'deleteme.smmx';
const storageBucketId = 'xxxxxxxxxxx';
module.exports = async () => {
const auth = new google.auth.OAuth2(clientId, clientSecret, redirectUri);
const token = JSON.parse(readFileSync(tokenPath));
auth.setCredentials(token);
const storage = new Storage();
const bucket = storage.bucket(storageBucketId);
const file = bucket.file(fileName);
const writeStream = file.createWriteStream({ resumable: false });
const drive = google.drive({ version: 'v3', auth });
let progress = 0;
const res = await drive.files.get({ fileId, alt: 'media' }, { responseType: 'stream' });
await new Promise(resolve => {
res.data.on('data', d => (progress += d.length)).pipe(writeStream);
writeStream.on('finish', () => {
console.log(`Done downloading file ${fileName} from Google Drive to Cloud bucket (${progress} bytes)`);
resolve();
});
});
const readStream = file.createReadStream();
progress = 0;
const media = {
mimeType: 'application/x-zip',
body: readStream
.on('data', d => {
progress += d.length;
})
.on('end', () => console.log(`${progress} bytes read from storage`))
};
await drive.files.update({
fileId,
media
});
console.log(`File ${fileName} successfully uploaded to Google Drive`);
return 0;
};
The only difference here is that instead of using createWriteStream and createReadStream from the Node.js fs
module, I'm using the corresponding methods file.createWriteStream and file.createReadStream from the Google Cloud Storage library.
When I run this code on the App Engine in the Cloud, the first time, everything seems OK, the output is the same as when I run locally:
Done downloading file deleteme.smmx from Google Drive to Cloud bucket (371 bytes)
371 bytes read from storage
File deleteme.smmx successfully uploaded to Google Drive
When I look at the latest version of the file on the Google Drive web frontend, however, it is not my smmx file anymore, but a JSON file, it looks like a string representation of the read stream:
{
"_readableState": {
"objectMode": false,
"highWaterMark": 16384,
"buffer": { "head": null, "tail": null, "length": 0 },
"length": 0,
"pipes": null,
"pipesCount": 0,
"flowing": true,
"ended": false,
"endEmitted": false,
"reading": false,
"sync": false,
"needReadable": true,
"emittedReadable": false,
"readableListening": false,
"resumeScheduled": true,
"paused": false,
"emitClose": true,
"destroyed": false,
"defaultEncoding": "utf8",
"awaitDrain": 0,
"readingMore": false,
"decoder": null,
"encoding": null
},
"readable": true,
"_events": {},
"_eventsCount": 4,
"_writableState": {
"objectMode": false,
"highWaterMark": 16384,
"finalCalled": false,
"needDrain": false,
"ending": false,
"ended": false,
"finished": false,
"destroyed": false,
"decodeStrings": true,
"defaultEncoding": "utf8",
"length": 0,
"writing": false,
"corked": 0,
"sync": true,
"bufferProcessing": false,
"writecb": null,
"writelen": 0,
"bufferedRequest": null,
"lastBufferedRequest": null,
"pendingcb": 0,
"prefinished": false,
"errorEmitted": false,
"emitClose": true,
"bufferedRequestCount": 0,
"corkedRequestsFree": { "next": null, "entry": null }
},
"writable": true,
"allowHalfOpen": true,
"_transformState": {
"needTransform": false,
"transforming": false,
"writecb": null,
"writechunk": null,
"writeencoding": null
},
"_destroyed": false
}
It seems that piping a read stream from a cloud storage bucket to a write stream for uploading to Google Drive does not work the way I'd like it to work.
What am I doing wrong? What do I need to change so that my code runs correctly in the Cloud?
If you're interested, the full source code of my project can be found on GitHub.
Update: Workaround
I've found a way to work around this problem:
- Read the data from the read stream from the cloud storage bucket into a buffer
- Create a readable stream from this buffer as described in this tutorial
- pass this “buffer stream” to the
drive.files.update
method
This way, the zip file on Google Drive does not get corrupted, a new version is stored with the same content as before, as expected.
I find this rather ugly, however. With large mind map files, e.g. ones with many images in it, it will stress the server, since the whole content of the file has to be stored in memory.
I would prefer to make the direct piping from the cloud storage bucket to the Google Drive API work.