73

How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :

1,2,4,8,16,32,64,128,255 (max)

A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?

I guess I am worried about optimising disk space for a massive table.

ae.
  • 1,670
  • 1
  • 14
  • 19

2 Answers2

61

To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.

Kris Erickson
  • 33,454
  • 26
  • 120
  • 175
  • hm ? so, on multibyte charset it's ((size) + 2) ? – andyk Jul 20 '09 at 06:45
  • 4
    No, on multibyte it is (size * 2) + 1. – Kris Erickson Jul 20 '09 at 13:49
  • 3
    @Kris: I believe it can also be (size * 3) + 1 or even (size * 4) + 1, no? – Alix Axel Jan 04 '10 at 04:46
  • 3
    The statement on multibyte is very misleading. UTF-8 is a multibyte character encoding and yet for ASCII and Latin1 characters (the first 256 characters of UTF8) you will have a one byte-one character correspondence. UTF-16 will use, only for code points in the Basic Multilingual Plane, 2 bytes per code point. – anno Oct 18 '11 at 00:27
  • 1
    If the size is more than 255, the total space required should be size * N + 2. – Wizist Oct 30 '12 at 03:30
  • wizist is correct, as of MySQL 5.0.3 you can declare VARCHAR fields larger than 255 characters, and in those cases the size will be ((size * 2) + 2) bytes. Source: http://dev.mysql.com/doc/refman/5.0/en/char.html – Code Commander Jun 03 '13 at 15:24
  • @anno: ASCII is only the first 128 characters of UTF8. It needs at least one more bit to signal you're starting a multi-byte character. – Black Mantha Jul 11 '18 at 13:13
  • does it means that VARCHAR(45) vs VARCHAR(200) makes a difference in performance of queries like SORT, ORDER? Regardless of what is length of actual stored data? – Jaydip Kalkani Jun 01 '22 at 03:15
27

MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.

Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).

This manual page has a more detailed explanation.

Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.

zombat
  • 92,731
  • 24
  • 156
  • 164
  • 39
    Actually access wise it makes a huge difference, since when MySql creates the result set in memory it turns it into a matrix, thus if your largest field is 16 characters, but the field is defined as 128 characters you have to allocate an extra 112 characters for every row. This wastes a lot a memory. – Kris Erickson Jul 20 '09 at 05:22
  • 3
    Not only do you have to allocate an extra 112 characters for each row in memory, but you will have a great deal more L1 cache misses when each row is iterated over. This is compared to a VARCHAR(16) column type, where you could squeeze a lot more rows with this column into high-speed cache under-the-hood. – The Aelfinn Feb 08 '17 at 21:13