0

I use STANDARD_HASH in the manner below to hash credit card numbers. It returns hashes with 40 characters. This seems excessive for credit card numbers which have 16 digits. I would like to save space in my export. How can I create shorter hashes while still achieving these goals:

  1. Have the same level of security and non-reversibility as STANDARD_HASH
  2. Keep the likelihood of two card numbers receiving the same hash very small (though if this happens a few times, it's OK)
  3. Have the shortest possible hash result in terms of characters or space required when exporting to a CSV
  4. Perform this operation while using as few database resources as possible
  5. Perform this operation using read-only access to the database

If a method exists which achieves goals 2 and 3, then I expect that goal 1 could be achieved by using this method to hash the output of STANDARD_HASH.

SELECT STANDARD_HASH(TRIM('  123456789123456789  ' )) FROM DUAL;

TRIM removes the spaces and then STANDARD_HASH returns a hash of length 64.

Here's the same example on db<>fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=7cd086f1b60f69eb3bc6f54d4a211844

The database version is "Oracle Database 18c Enterprise Edition".

Bobby
  • 1,585
  • 3
  • 19
  • 42
  • Are you storing so many credit cards that this becomes a storage problem or do you have a security threat in mind? – Artjom B. Oct 28 '20 at 19:38
  • @ArtjomB. I want the CSV export to be as small as possible. Any kind of one-way hash is sufficient for security, so it's more about the storage issue. – Bobby Oct 28 '20 at 19:43

1 Answers1

2

That length of 64 is not the length of the result, but just how it displays. STANDARD_HASH returns a RAW value, that is displayed as hexadecimal.

You can convert this raw value into something usable using the UTL_RAW functions at https://docs.oracle.com/database/121/TTPLP/u_raw.htm#TTPLP71498

Eg

  SELECT UTL_RAW.CAST_TO_VARCHAR2 (STANDARD_HASH(TRIM('  123456789123456789  ' ))) FROM DUAL;

Note that when you try this in the fiddle, you’ll find a few ? that represent non-printable characters, so allow for that in your export.

Edit to add : STANDARD_HASH uses SHA1 by default - but that and MD5 have vulnerabilities - better to just add the extra parameter to STANDARD_HASH to use a longer SHA -see https://docs.oracle.com/database/121/SQLRF/functions183.htm#SQLRF55647

  SELECT UTL_RAW.CAST_TO_VARCHAR2 (STANDARD_HASH(TRIM('  123456789123456789  ' ), ‘SHA256’)) FROM DUAL;

Edit to address the 5 points :

  1. it uses the same STANDARD_HASH so is the same
  2. SHA1 is prone to collisions, so as above swap to SHA256 or higher
  3. STANDARD_HASH uses industry-standard hashing algorithms. It is what it is. Be aware that by its very nature, hashing returns binary values, so it is your responsibility to convert them to appropriate format - eg for CSV files, you can convert to Base64 (see Base64 encoding and decoding in oracle )
  4. and 5. No additional resources

Edit to respond to addition comments :

Yes, full SELECT you stated looks correct :

select utl_raw.cast_to_varchar2(utl_encode.base64_encode(
     STANDARD_HASH(TRIM('  123456789123456789  ' ), 'SHA1')))  FROM dual;

Base64 operates on groups of 3 bytes at a time, and appends "=" for each byte short. SHA1 hashes are always 20 bytes, so is always 1 byte short.

So offhand, you COULD trim that trailing "=" off - though I would advise against it (lean code beats premature optimisation). For example, if you subsequently decided to upgrade from SHA1 to SHA256, that generates hashes with a different number of bytes, and therefore potentially 0 or 2 "=" at the end, so weird bugs await.

Yes, "+" and "/" are valid characters in the Base64 output (along with 0-9, and upper-and lower- case letters - hence 64 characters in all, plus the =), but importantly commas and double-quotes are not - so yes, Base64 strings are safe to go into a CSV format.

FYI, a quick summary of Base64 (since I guess that you like me always like to have an overview of what I'm dealing with)

Base64 is used to translate a stream of binary data into printable strings. Now 3 bytes of binary data is 24 bits, which of course can be regarded as 4 lots of 6-bits (we can ignore the byte boundaries). Any collection of 6 bits has 2^6 = 64 possible values (hence the Base64 name), which are represented as 64 characters :

  • Upper-case letters
  • Lower case letters (so yes, case-sensitive).
  • digits 0-9
  • "+" and "/"

Hence each character in the Base64 output represents the next 6 bits of the binary data.

racraman
  • 4,988
  • 1
  • 16
  • 16
  • I’ll try this tomorrow. Does it achieve all 5 goals when I export to a CSV? How does the length relate to the length of the input? – Bobby Oct 27 '20 at 22:41
  • I've read through the links you posted and have come up with this potential solution: `select utl_raw.cast_to_varchar2( utl_encode.base64_encode( STANDARD_HASH(TRIM(' 123456789123456789 ' ), 'SHA1' ))) FROM dual;` If that looks reasonable to you, could you add it to the answer or make suggestions on how to improve it further to give shorter output? It returns strings of length 28, but the last is always an `=` so I could remove it with SUBSTR. I used SHA1 because the output is shorter and I calculated the probability of a collision at 3.4e-31. – Bobby Oct 28 '20 at 18:59
  • So this would be my final suggestion with SUBSTR, but I would still like to make the output shorter if possible and can accept a higher probability of collision if necessary. `select SUBSTR( utl_raw.cast_to_varchar2( utl_encode.base64_encode( STANDARD_HASH(TRIM(' 123456789123456789 ' ), 'SHA1' ))), 1, 27) FROM dual;` – Bobby Oct 28 '20 at 19:05
  • By the way, it seems the output is case-sensitive and sometimes includes other symbols like + and /. Is that correct. – Bobby Oct 28 '20 at 19:12