3

Records are identified by 32 byte SHA-2 value. Using BINARY is more compact (8 bits per byte vs 4 bits per byte) but that is offset by the increased difficulty in using binary value in adhoc queries (troubleshooting, etc). I am leaning towards storing as hex strings in 64 byte CHAR field. Are there any other issues or concerns beyond the 4 byte 32 byte per record overhead?

Update: Based on comments I did some testing in SQL 2012 and there is no longer any 'difficulty' in adhoc queries (SSMS) involving binary keys. My experience on the topic was a little dated. In SSMS for SQL Server 2000 there was no implicit conversion from binary to hex strings. So it became necessary to explicitly convert between binary and hex strings by functions to get any meaningful results. It looks like that changed in SQL Server 2005.

Gerald Davis
  • 4,541
  • 2
  • 31
  • 47
  • 1
    64 chars vs. 32 byte is 32 bytes per row overhead. And those 32 bytes per row overhead might be carried over to secondary indexes, too. And keying in a 64 characters hash string is always bad... – dnoeth May 13 '15 at 18:35
  • If the records are being used for a foreign key reference, then use an identity column for the table with the SHA-2 value simply being an index column. – Gordon Linoff May 13 '15 at 18:37
  • @GordonLinoff Any specific reason vs using the hash as the primary key? – Gerald Davis May 13 '15 at 18:39
  • @MartinSmith Well that might have 'solved' it. I was thinking from memory that SSMS returned binary as a sequence of ASCII bytes but that might not have been the case since SQL Server 2000. I just ran some tests and it looks like implicit conversion in SSMS is good both ways and that is the only place there is going to be any adhoc queries. – Gerald Davis May 13 '15 at 18:42
  • 1
    To reduce the overhead when you use foreign keys. I also think that it adds an iota more security for the database to use internal ids rather than external ids for row identification. – Gordon Linoff May 13 '15 at 18:46
  • @GordonLinoff I can understand that and have heard that view expressed before but in most cases i refrain from making an extra Identity column when the data already has a 'natural' primary key. The FK angle is one I hadn't considered. Depending on how many tables use this key as a FK it could be significant. In this case however that doesn't apply. Still +1 for making me think about the FK considerations. – Gerald Davis May 14 '15 at 14:00

1 Answers1

1

I was looking for a similar answer to this question. However it was a question of storing a GUID as binary or varchar in a database. Different type of object but same principle.

Here is some more information: How should I store GUID in MySQL tables?

I also found a useful article that compared the performance of querying table with millions of rows with binary primary key and millions as varchar. The search performance was huge when using binary so I use this now when I know I'm going to be dealing with a lot of rows. Unfortunately I was unable to find the original article.

Once you get everything setup to manage the binary information it's no more work than using a varchar.

Community
  • 1
  • 1
Simon Darlow
  • 530
  • 5
  • 14
  • That is a good point and as pointed out by @dnoeth, the major pro for using varchar/char doesn't really apply anymore. I was coming from some dated knowledge (haven't been in a binary primary key situation since SQL 2000). As I 'fondly' recall back then Selects against binary fields would be returns as ASCII (so yes something like this '2&@ !Kwk#') requiring the use of functions to convert it to hex strings. That isn't the case anymore (since SQL 2005?). So throw in performance considerations and the extra space, and the case for binary is starting to be pretty one sided. – Gerald Davis May 14 '15 at 13:56
  • I noticed that too in MySQL. When I queried the binary field which was populated with a string, it returned the string in it's original format. Even though it's stored as a blob. Glad I could be of assistance. – Simon Darlow May 15 '15 at 03:13