31

Please take a look at this table :

http://www.mediawiki.org/wiki/Manual:Logging_table

As you can see wikipedia use varbinary instead of varchar :

| log_type      | **varbinary**(32)       | NO   | MUL |                |
| log_action    | **varbinary**(32)       | NO   |     |                |
| log_timestamp | **binary**(14)          | NO   | MUL | 19700101000000 |
| log_user      | int(10) unsigned        | NO   | MUL | 0              |  
| log_user_text | **varbinary**(255)      |      |     |                |

All of these information are text , so why they save them as binary ?

They do this for all tables .

Machavity
  • 30,841
  • 27
  • 92
  • 100
user1411084
  • 502
  • 1
  • 4
  • 11
  • 2
    [Here](http://stackoverflow.com/questions/5978484/what-is-the-advantage-of-using-varbinary-over-varchar-here) talks about varbinary using less space then varchar. – threenplusone Nov 15 '12 at 08:22
  • @threenplusone - That is in the context of a particular puzzle. Rather than converting the string `1.2.3` to `'100000000011000000000210000000003'` by padding each component out to the max length of an int it uses a more compact binary representation `0x020000000102000000020200000003` – Martin Smith Nov 15 '12 at 08:43

2 Answers2

19

Mediawiki changed from varchar to varbinary in early 2011:

War on varchar. Changed all occurrences of varchar(N) and varchar(N) binary to varbinary(N). varchars cause problems ("Invalid mix of collations" errors) on MySQL databases with certain configs, most notably the default MySQL config.

Timo Tijhof
  • 10,032
  • 6
  • 34
  • 48
Sjoerd
  • 74,049
  • 16
  • 131
  • 175
  • 19
    It looks like they fixed the problem at the wrong side. They silenced the symptom instead of fixing the underlying collations issue. – usr Nov 15 '12 at 12:36
8

In MSSQL:

I think the big difference is only between nvarchar and varbinary.

Because nvarchar stores 2 bytes for each character instead of 1 byte.

varchar does the same as varbinary: from MSDN:

The storage size is the actual length of the data entered + 2 bytes" for both.

The difference here is by varbinary The data that is entered can be 0 bytes in length.

Here is a small example:

CREATE TABLE Test (textData varchar(255), binaryData varbinary(255))

INSERT INTO Test 
VALUES('This is an example.', CONVERT(varbinary(255),'This is an example.',0))
INSERT INTO Test 
VALUES('ÜŰÚÁÉÍä', CONVERT(varbinary(255),'ÜŰÚÁÉÍä',0))

What you can use here is the DATALENGTH function:

SELECT datalength(TextData), datalength(binaryData) FROM test

The result is 19 - 19 and 7 - 7

So in size they are the same, BUT there is an other difference. If you check the column specifications, you can see, that the varbinary (of course) has no collation and character set, so it could help use values from different type of encoding and character set easily.

SELECT 
  *
FROM   
  INFORMATION_SCHEMA.COLUMNS 
WHERE   
  TABLE_NAME = 'Test' 
ORDER BY 
  ORDINAL_POSITION ASC; 
András Ottó
  • 7,605
  • 1
  • 28
  • 38