1

I am trying to upload larger files to a google bucket from nodejs. uploading any file under and around the 200MB size mark works perfectly fine. Anything greater than that returns an error

Cannot create a string longer than 0x1fffffe8 characters

By me having a file that big, I have found out that node, does have limitations on how big a blob/file can be. Here are the two code snippets that both throw the same error

This one is with upload streaming

let fileSize = file.size;
      fs.createReadStream(file)
        .pipe(
          upload({
            bucket: BUCKET,
            file: file,
          })
        )
        .on("progress", (progress) => {
          console.log("Progress event:");
          console.log("\t bytes: ", progress.bytesWritten);
          const pct = Math.round((progress.bytesWritten / fileSize) * 100);
          console.log(`\t ${pct}%`);
        })
        .on("finish", (test) => {
          console.log(test);
          console.log("Upload complete!");
          resolve();
        })
        .on("error", (err) => {
          console.error("There was a problem uploading the file");
          reject(err);
        });

and of course just a regular bucket upload

await storage.bucket(BUCKET)
           .upload(file.path, {
             destination: file.name,
            })

I have come to terms that the only solution can be to chunk the file, upload it in chunks, and rejoin the file chunks in the bucket. The problem is that i don't know how to do that and i cant find any documentation on google or GitHub for this clause

wowza_MAN
  • 95
  • 1
  • 9
  • 4
    Perhaps [this](https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads) can be of assistance. https://cloud.google.com/storage/docs/composing-objects#storage_compose_object-nodejs – dany L Oct 19 '21 at 19:15
  • Uploading large objects reliably takes a fair amout of code and testing. The edge and corner cases will have you fixing bugs for weeks/months. I recommend using a published code module that someone else has written and tested and deployed. 1) license/purchase production code; 2) clone a well-supported GitHub library; 3) use the Google SDK and move on to the next task. – John Hanley Oct 19 '21 at 19:35
  • @danyL Thank you so much for that reference! I Ended up splitting the files into smaller ones, uploading them, then joining them. works like a charm. Thanks a bunch – wowza_MAN Oct 19 '21 at 19:50
  • 1
    @wowza_MAN you can add your implemented approach for future community reference ;) – tmarwen Oct 19 '21 at 20:47
  • Out of curiosity, may I know what is the file size? I recall reading somewhere the max size is 5TB per object for Google Cloud Storage and I often wonder how long will that take and if it can be uninterrupted? – FlyingPenguin Oct 20 '21 at 07:51
  • 1
    @FlyingPenguin how much it will take depends roughly on the wire speed and on the network approach used to connect to Google VPC. Approximately, for a network with a 1 Gbps bandwidth, it will take the 5 To file around 15 hours to be transferred and yes interruption may still occur. – tmarwen Oct 20 '21 at 13:03
  • Hello, I agree with @tmarwen it would be helpful if you share with the community you are implemented approach. – vi calderon Oct 20 '21 at 21:31

3 Answers3

2

To resolve this issue I checked the file size to see if it was larger than 200MB. I chunked it in 200MB chunks (roughly) then uploaded each individually. then joined the files with bucket.combine()

A very important note is to add the timeout. by default google has a 1 min file upload timeout, I have set it to 60 mins in the below snippet. It is a very hacky approach i must admit

if (uploadF.size > 209715200) {
    await splitFile
      .splitFileBySize(file.path, "2e8")
      .then(async (names) => {
        console.log(names);
        for (let i = 0; i < names.length; i++) {
          console.log("uploading " + names[i]);
          await storage
            .bucket(BUCKET)
            .upload(names[i], {
              destination: names[i],
              timeout: 3600000,
            })
            .catch((err) => {
              return { status: err };
            });
        }

        await bucket
          .combine(names, file.name)
          .catch((err) => {
            return {
              status: err,
            };
          });

        for (let i = 0; i < names.length; i++) {
          console.log("deleting " + names[i]);
          await storage
            .bucket(BUCKET)
            .file(names[i])
            .delete()
            .then(() => {
              console.log(`Deleted ${name[i]}`);
            })
            .catch((err) => {
              return { status: err };
            });
        }
        console.log("done");
        return { status: "ok" };
      })
wowza_MAN
  • 95
  • 1
  • 9
1

I think external user shouldn't worry about the file size upload if the google storage supports up 5TB file sizes. Submitted issue to google team. https://github.com/googleapis/nodejs-storage/issues/2167

0

There is option that does the upload in chunks in the google storage api itself no need to implement it yourself by just using chunkSize option:

    const options = {
      destination: destination,
      resumable: false,
      validation: 'crc32c',
      chunkSize = 100 * 2**20
    };

    const [file] = await this.bucket.upload(filePath, options);