0

I am storing some data in an external key-value store. This data is used as a cache. Because of the nature of the data we need to encrypt/hash the keys as well as values. We are using DataProtection APIs for the encryption and decryption with the default algorithm (AES-256-CBC). As per my knowledge, the encryption of the same plaintext doesn't give you the same cyphertext in this algorithm, so I can't encrypt the keys because next time I won't have the same encrypted key for lookup.

If we hash the keys (using SHA-256) instead of encrypting it, we can actually solve this problem but in some rare scenarios hashing can cause collisions and in our application, due to the nature of data we can't afford to have even a single collision. Code example:

public class MyClass 
{
    IDataProtector dataProtector;
    ISomeStore externalStore;

    public MyClass(IDataProtectionProvider dataProtectionProvider, ISomeStore externalStore) 
    {
       this.dataProtector = dataProtectionProvider.CreateProtector("somePurposeString");
       this.externalStore = externalStore;
    }

    public string GetOrAddValue(string someKey)
    {
        string encryptedKey = this.dataProtector.Protect(someKey);

        if (this.externalStore.KeyExists(encryptedKey)
        {
           string encryptedValue = this.externalStore.Get(encryptedKey); // lookUp in the cache
           return this.dataProtector.Unprotect(encryptedValue);
        }
        else
        {
           string someValue = GetValue(someKey);
           this.externalStore.Set(encryptedKey, this.dataProtect.Protect(someValue)); // setting the value in the cache
           return someValue;
        }
    }
}

Is there a way to efficiently solve this problem? The avergae lookup time from external key value store is around 100 ms.

Sunil Kumar
  • 390
  • 1
  • 7
  • 25
  • As I know in CBC mode you will get same cipher text from same plain text if key and IV are also same. – Alexey Rumyantsev Apr 12 '21 at 13:01
  • Do you know what's the default IV used for the data protection APIs in .Net core? – Sunil Kumar Apr 12 '21 at 13:26
  • The chance of collisions in SHA-256 is very small indeed. https://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice The chance of memory corruption changing your data in an undetectable way is FAR larger than the collision chance. So is the chance of an asteroid colliding with earth and destroying your data centers. You could go to SHA-384 or SHA-512 to gain extra collision proofing. – O. Jones Apr 12 '21 at 15:07
  • Thanks @O.Jones for you input. This seems quite logical now. I will move with hashing then. – Sunil Kumar Apr 12 '21 at 16:11

1 Answers1

1

I don't know about the size of your data. But you can use hashing in this way to reduce the chances of collisions to 0.

  1. Hash the original key before storing it to your external store.
  2. You can tweak the value to be a dictionary of key:value, where key is original key and value is original value.
  3. Encrypt the value (now a dictionary) before storing it to the store.
  4. Next time onwards, when you want to do the lookup. First hash the original key and check for the match. If it matches then decrypt the dictionary value and do the lookup of original key in the dictionary. If match then good. If the original key is not found then append the new key and value in this dictionary and then encrypt the whole dictionary again and store it on your store.

This reduces the collision to 0 but this will increase the payload size which may not be desired in your case.

  • Thanks for this solution. This sounds interesting. I will go with simple hashing of key as pointed out by O. Jones in the comment. Because there are other things to worry than the collision itself as per the collision frequencey probablities. – Sunil Kumar Apr 23 '21 at 08:24