0

(Node.js API)

I am trying to do the following:

  1. Generate file path like /uploads/${uuid.v4()}.extension
  2. Write the file.

This is the code:

    const path = `/uploads/${uuidv4()}.${extname(fileName)}`;
    const file = bucket.file(path);
    await new Promise((resolve, reject) =>
        data
            .pipe(file.createWriteStream({ contentType }))
            .once('error', reject)
            .once('finish', resolve),
    );

It works fine. But bothers me to no end that there is that miniscule probability that same UUID will be generated. It is not a practical concern.

How can I upload data to Cloud Storage but get an error if there's a clash? I can check if the file exists beforehand but there is still a race condition technically...

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
Eugene
  • 9,242
  • 2
  • 30
  • 29
  • I'm wondering if Object Versioning might hold some possibilities? See ... https://cloud.google.com/storage/docs/object-versioning If I'm grokking this correctly, then if two GCS objects were created overlapping each other AND there was object versioning in play then we could (in theory) detect that there has been a collision by asking GCS for the details of the file including its versions. I am imagining two identically named GCS files being created at the exact same time... when we later query the "file", we will find it has two versions. – Kolban Aug 18 '22 at 02:52
  • @Kolban THen how does each uploading client reconcile the fact that they might have uploaded the n>1 version of the object and figure out how to fall back from that? Not trivially. Without a transactional API, everyone is just blindly uploading new versions. – Doug Stevenson Aug 18 '22 at 04:19

2 Answers2

1

The chance of a collision is not just miniscule: it's astronomically low for UUIDs of significant size. Putting effort into solving the problem of such a collision is not likely to be worth the effort.

That said, if you still want to, you won't be able to do it with Cloud Storage APIs alone, since there is no transactional, atomic API to interact with. If you want a "hard" guarantee that there is no collision, you will need to interact with an entirely different Cloud service that does allow you to effectively "lock" some unique string (e.g. a file path) as a flag for all other processes to check so that they don't collide. Since you are working in Google Cloud, you might want to consider using a database (like any SQL database, or Firestore) with atomic transactional operations to "reserve" the path so that only one process can use it (assuming they all correctly observe this reservation and cooperate as such).

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • I know it is really low. I am working on a toy project and this is just a mental exercise for me. I know I can use an ID of the Firebase doc that I use to store some extended metadata. I am curious if it is possible to do it without another system... – Eugene Aug 18 '22 at 04:43
  • So, the randomly generated ID from Firestore or Realtime Database is also essentially a UUID generated on the client app (not on the server). It could also have collisions, [also astronomically low](https://stackoverflow.com/questions/54268257/what-are-the-chances-for-firestore-to-generate-two-identical-random-keys). – Doug Stevenson Aug 18 '22 at 12:11
  • I assumed it was like rowId in relational DB - guaranteed unique. Anyways, I will ignore the problem for now. I added date to the uploads folder so risks are even lower now. – Eugene Aug 18 '22 at 18:09
0

Isn't this exactly what preconditions are for?

Copied from the docs: https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-nodejs

const options = {
    destination: destFileName,
    // Optional:
    // Set a generation-match precondition to avoid potential race conditions
    // and data corruptions. The request to upload is aborted if the object's
    // generation number does not match your precondition. For a destination
    // object that does not yet exist, set the ifGenerationMatch precondition to 0
    // If the destination object already exists in your bucket, set instead a
    // generation-match precondition using its generation number.
    preconditionOpts: {ifGenerationMatch: generationMatchPrecondition},
  };

  await storage.bucket(bucketName).upload(filePath, options);
  console.log(`${filePath} uploaded to ${bucketName}`);
Durian
  • 253
  • 2
  • 9