1

After some discussion on my question about base64 not being safe for Firestore IDs here I would like to know how one can encode a string to a Firestore "safe" Document ID.

Here is the problem:

  • I am login in users via a custom authentication service.
  • That service provides a username that can contain / that are not safe for the firestore document ids as declared here

I asked about the base64 in another question and that is not safe as it contains /

So what could be a safe way to encode that string without loosing the entropy of the username that the external service provides. That means that there could be a username such as dimi/test1 and another as dimitest1 so just stripping out characters is not an option.

Also since that service has available an open API and my service exposes the document ID's via URLs I would like not to expose the other service usernames via my apps URLS.

Any suggestions?

Jimmy Kane
  • 16,223
  • 11
  • 86
  • 117
  • Can I ask why you want (as user) to generate the ID? Firestore already have a unique safe ID generator and Firebase Auth as well, any of those should give you unique IDs for your necessities. – Ignacio Bustos Jul 07 '20 at 07:19
  • @IgnacioBustos I am adding in a queue items that I need access without quering. So if I can always build the same id I can access that faster and with less reads/writes. Imagine that is an API service / function that gets about 100K calls per day. Does this help? – Jimmy Kane Jul 07 '20 at 08:23
  • @IgnacioBustos Additionally this way I wont every insert a duplicate queue item. The input I get from API calls is : USERNAME/WORKOUTID. – Jimmy Kane Jul 07 '20 at 08:25
  • @IgnacioBustos You can also consider it as a Mapping to each service's users workout. So even if the user on that service updates the info, the same queue item is updated and leter on consumed – Jimmy Kane Jul 07 '20 at 08:26
  • oh, I get that, very expensive to call Firebase for this. So let me assume more things and correct me if not. So 'userID' (A) & 'workoutID' (B) is known, isn't it? in that case, your new relationalDB can be `${userID}${workoutID}` without any slash `/` in it. In case you dont know the IDs beforehand, I have another approach to generate IDs by yourself fast and reliable, but its mandatory to use Node.js for security reasons. Let me know and I will write to you the approach. – Ignacio Bustos Jul 07 '20 at 12:15
  • I added a reply directly in case you need to generate completely the IDshttps://stackoverflow.com/a/62775792/1240074 – Ignacio Bustos Jul 07 '20 at 13:00

3 Answers3

2

Use encodeURI() followed by SHA256. This will constraint the document ID to

Must be valid UTF-8 characters
Must be no longer than 1,500 bytes
Cannot contain a forward slash (/)
Cannot solely consist of a single period (.) or double periods (..)
Cannot match the regular expression __.*__

encodeURI is for valid UTF-8 characters.

SHA256 is fixed length at 256 bits (or 32 bytes) therefore not exceeding 1,500 bytes limit.

SHA256 characters are [a-fA-F0-9] according to https://stackoverflow.com/a/12618366/3073280.

Lastly, you mentioned that it will need entropy. SHA256 is well diffused.

Jek
  • 5,546
  • 9
  • 37
  • 67
2

EDIT

In order to transform strings into uniqueIDs very fast use crypto.createHash() instead. The result will be the same for a given string input.

You can use MD5 or SHA256 as both takes the same time, 2.2s average to calculate 1 Million unique IDs.

Here is the code:

const crypto = require('crypto');

function uniqueId(string, algorithm = 'md5') {
  return crypto.createHash(algorithm).update(string).digest('hex');
}

console.log('started');
console.time('generateIDsMD5')
for (let i = 0; i < 1000000; i++) {
  uniqueId('a string ' + i);
}
console.timeEnd('generateIDsMD5');

console.time('generateIDsSHA256')
for (let i = 0; i < 1000000; i++) {
  uniqueId('a string ' + i, 'sha256');
}
console.timeEnd('generateIDsSHA256');

// For instance, It will take around 2.2s average
// to generate 1Million Unique IDs with MD5 or SHA256 encryption

console.log('MD5 string ', uniqueId('a string ' + 1));
console.log('MD5 sameString ', uniqueId('a string ' + 2));
console.log('MD5 sameString ', uniqueId('a string ' + 2));
console.log('SHA256 string ', uniqueId('a string ' + 1, 'sha256'));
console.log('SHA256 sameString ', uniqueId('a string ' + 2, 'sha256'));
console.log('SHA256 sameString ', uniqueId('a string ' + 2, 'sha256'));
console.log('finished');

PREVIOUS ANSWER

I adapted the code from Firebase and made it available directly on your node.js with some custom test for you. It takes up to 3s for 1 Million IDs, and only 300ms for 100.000 IDs which is your considered daily usage approach.

This uses crypto considered very safe if run in node.js environment.

here is the function wrapped with usage example:

const crypto = require('crypto');

function autoId(bytesLength) {
  const chars =
    'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  let autoId = '';
  while (autoId.length < bytesLength) {
    const bytes = crypto.randomBytes(40);
    bytes.forEach(b => {
      // Length of `chars` is 62. We only take bytes between 0 and 62*4-1
      // (both inclusive). The value is then evenly mapped to indices of `char`
      // via a modulo operation.
      const maxValue = 62 * 4 - 1;
      if (autoId.length < bytesLength && b <= maxValue) {
        autoId += chars.charAt(b % 62);
      }
    });
  }
  return autoId;
}

console.log('started');
console.time('generateIDs')
for (let i = 0; i < 1000000; i++) {
  autoId(20);
}
console.timeEnd('generateIDs');
// For instance, It will take around 3s average
// to generate 1 Million Unique IDs with 20 bytes length

console.log('example 20bytes ', autoId(20));
console.log('example 40bytes ', autoId(40));
console.log('example 60bytes ', autoId(60));
console.log('finished');

Simply use node thisfile.js and you will see your result.

Since firebase is mainly open source we can find the official uniqueId generator used in node.js to generate the IDs here: https://github.com/googleapis/nodejs-firestore/blob/4f4574afaa8cf817d06b5965492791c2eff01ed5/dev/src/util.ts#L52

IMPORTANT

If you are going to join 2 IDs, do not use any slash /, as you know it is not allowed, instead use underscore _ or nothing at all since you have control of the length of an ID, therefore you should know how to split the ID accordingly (40 bytes contain 2 IDs of 20 bytes for instance).

The limitation of firestore in Document Ids is 1500 bytes so you have plenty to play with.

More info: https://firebase.google.com/docs/firestore/quotas#limits

Ignacio Bustos
  • 1,415
  • 2
  • 17
  • 26
  • I know you took some time to write this and for that I ll be awarding you for sure. However (And answering the comment as well) , I get the ids from others (API). `${userID}${workoutID}` would not work as the URI later on can be exposed. One could use those (From a shared URI). So I would like those to lets say: "Base58?" Crypto? So 2 inputs (USERID, WorkoutID) should give a unique Firestore ID. I think (from what I understand) this answer does not do it right ? I appricate you took the time to teach me how to create IDS (and will use it) but those are given :-/ – Jimmy Kane Jul 07 '20 at 13:30
  • To be a little more helpfull maybe: https://github.com/jimmykane/quantified-self/blob/master/functions/src/queue.ts#L74 See how a queue item is inserted. – Jimmy Kane Jul 07 '20 at 13:33
  • Ok I see, so you want the `generateIDFromParts` to generate the same string while the input is the same, no matter how much you call the function again with the same input? – Ignacio Bustos Jul 07 '20 at 13:42
  • Correct! Now I base58 encode. According to my research should be fine but I am scared about collisions etc. Also my own answer is downvoted so it starts to smell if you get my point. – Jimmy Kane Jul 07 '20 at 13:53
  • you can pass both Ids concatenated like `uniqueId(\`${userID}${workoutID}\`)` to get the same result always – Ignacio Bustos Jul 07 '20 at 14:10
  • you can also digest the hash in a different way to get more complex unique results like 'utf-8', more info https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings – Ignacio Bustos Jul 07 '20 at 14:13
-1

I used Base58 and that was the most safe I could research for

Jimmy Kane
  • 16,223
  • 11
  • 86
  • 117
  • why not use hash e.g. SHA256? It is fixed-length and its characters pool are `[a-fA-F0-9]` https://stackoverflow.com/a/12618366 while preserving the entropy. – Jek Feb 13 '20 at 16:16