0

Is it better to set an index (primary or secondary) on a varchar(x) or varbinary(x) column?

I wasn't sure it even mattered. However what I see online (Googled - varchar vs varbinary) is that varchar is almost dead or being pushed to the way side. So is this better to index or something? Could it be the type of index?

Excellent scenario: Indexing email addresses ([edit] encrypted byte array {varbinary} or string equivalent {varchar})

Thanks

Answer?

It seems that indexes on varbinary is the worst thing ever. Am I reading this right?

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/35b61bb0-1fa8-4a2f-a9fb-729a1874dcf8/clustered-index-on-a-varbinary-column

Matt Gibson
  • 37,886
  • 9
  • 99
  • 128
AnswerIsNot42
  • 25
  • 1
  • 7
  • 2
    Just something to keep in mind if you're deciding between text and binary for your column - binary will result in case-sensitive queries, and text (by default at least) would not. If that's important for your requirements, then it would be more important than deciding based on index performance. – Joe Enos May 30 '15 at 00:01
  • 1
    And what are you reading that says that varchar is dead? I think you must be misunderstanding whatever it is. – Joe Enos May 30 '15 at 00:02
  • Interesting. Case-sensitive, so does that mean it's faster like searching an exact object? Hypothetically would it be better encrypt the email and then put it in? ...excluding the obvious security benefits, only looking at structure – AnswerIsNot42 May 30 '15 at 00:08
  • Those are ***TWO TOTALLY*** different things! No way to compare them directly. `varbinary` is for **binary** data - like a PDF file, a video stream, a MP3 file etc. while `varchar(x)` is for **text** data - strings, paragraphs, entire book sections. Those have **nothing** to do with one another..... and `varchar(x)` is **most definitely** nowhere near "dead" ..... – marc_s May 30 '15 at 03:04
  • Understood, **but while** encrypting it will be a byte array.... so now I have the email (encrypted) as a byte array -- should that be converted to varchar? – AnswerIsNot42 May 30 '15 at 16:44
  • The original discussion you linked is about avoiding using varchar or varbinary as primary key. You can use an identity primary key and create an unique index on your email (or encrypted email data). Using long varchar or varbinary as primary key has some issues, especially the email value itself provides no clustering benefits. – Tim3880 May 30 '15 at 17:31

2 Answers2

1

It is better to create an index on varchar than varbinary. Varbinary is suitable for blobs but you can store strings in varbinary also. Such blobs are complementary to your actual data. Your own research lead to that conclusion also.

An email address can be entered by user in variety of formats - abc@xyz.com or Abc@Xyz.com etc. It is easier to store/extract such information in/from varchar field. Joe Enos is absolutely right that binary comparisons will be case-sensitive (comparing binary info) whereas varchar will be case-insensitive assuming that's how you have set up your DB and column collation. With varbinary, you'll also have to be careful about padding.

Varchar is alive and healthy. When you index varchar(100), try to use a non-clustered index. My general preference is to use a surrogate key in most situations as clustered index.

zedfoxus
  • 35,121
  • 5
  • 64
  • 63
  • Well that explains the case issue, thanks. So expanding on this, if while encrypting an email address I have a byte array. Would it be faster performance if I stored that byte array as varbinary or converted to string and stored it in varchar? ...excluding the conversion time; imagine a db with 300k users and this is a key look up field – AnswerIsNot42 May 30 '15 at 16:52
  • I don't believe there will be any significant difference. However, converting binary data to text format usually leads to bigger data, which makes index a little bit slower. – Tim3880 May 30 '15 at 17:24
  • It's a good, but not an easy question to answer. I'd salt+sha-256 hash the email address and store the result in a char field. Index that field to do comparisons. Recognize that there *are* chances of collision, although rare. Encrypt and store email in another field - varbinary is good for that. Review [question 70450](http://stackoverflow.com/questions/70450/is-it-worth-encrypting-email-addresses-in-the-database) also – zedfoxus May 30 '15 at 17:29
  • @Tim3880 Milliseconds matter to me. Tiny things make a big difference (that's what she said). I will take that as my final answer. ...so arguably the varchar would be slower. – AnswerIsNot42 May 30 '15 at 17:43
1

The correct datatype to use is dictated by the contents of the column.

while encrypting an email address I have a byte array

This is binary data and should be stored as varbinary (except if the encryption routine outputs a fixed length array then use binary).

casting the varbinary to varchar would be of no benefit at all when it comes to storage (everything is stored as binary anyway) and would be likely to cause incorrect results.

The following all return "yes" in my default collation

SELECT 
       CASE
         WHEN cast(0xE6 AS VARCHAR(20))  = cast(0x6165 AS VARCHAR(20))
           THEN 'yes' else 'no'
       END,
       CASE
         WHEN cast(0xcc0000 AS VARCHAR(20))  = cast(0xcc2020 AS VARCHAR(20))
           THEN 'yes' else 'no'
       END,
       CASE
         WHEN cast(0x202020 AS VARCHAR(20)) = cast(0x AS VARCHAR(20))
           THEN 'yes' else 'no'
       END

Adding a COLLATE LATIN1_GENERAL_BIN alters the result of the first two but not the third.

I would expect looking up a varbinary(n) column to be faster than looking up a varchar(n) column generally as the comparison routines are simpler though perhaps not much in it for a varchar column with a binary collation.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845