0

Background

The MySQL documentation states the following:

In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

To put this to the test myself, I created two tables:

CREATE TABLE `varchar_length_test_255` (
  `characters` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `varchar_length_test_256` (
  `characters` varchar(256) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I then inserted 10,000 rows into each table, each row with values having the maximum length for the characters column.

Since I am using a character set that has a maximum byte length of one byte per character (latin1), I expected to see a difference of 20,000 bytes in storage size between the two tables, derived from the following:

  • Each row in the varchar_length_test_256 table contains an additional character than the rows in the varchar_length_test_255 table. With the latin1 character set, that adds up to 10,000 bytes, since there are 10,000 rows in each table.
  • Based on the MySQL documentation, VARCHAR values exceeding 255 bytes require an additional "length" byte. Since each row in the varchar_length_test_256 table contains a value in the characters column that has a length of 256, which equates to 256 bytes for each value since the latin1 character set is used, that adds up to another 10,000 bytes utilized.

Problem

When issuing a query to retrieve the size of each table, it appears that the tables are the same size! I used the following query (based on off of this SO post) to determine the size of each table:

SELECT 
  table_name AS `Table`,
  (data_length + index_length) `Size in Bytes`
FROM 
  information_schema.TABLES
WHERE 
  table_schema = "test";

which yielded this output:

+-------------------------+---------------+
| Table                   | Size in Bytes |
+-------------------------+---------------+
| varchar_length_test_255 |       4734976 |
| varchar_length_test_256 |       4734976 |
+-------------------------+---------------+
2 rows in set (0.00 sec)

What am I missing here?

  • Am I correctly understanding the MySQL documentation?
  • Is there something wrong with my test that is preventing the expected outcome?
  • Is the query I am using to calculate the size of the tables correct?
  • How could I correctly observe the information communicated in the MySQL documentation?

1 Answers1

0

Check he data_free column too.

InnoDB stores data on so called 'pages' which are 16KB in size (by default). When a page is almost full, and you insert a new record, but it can't fit on the page, MySQL will open a new page leaving the leftover space empty.

It is my assumption, that MySQL reports the number of pages times the page size as data/index sizes.

This is the effective size used on the OS to store the table's data, not the actual size stored on those pages.

Update: https://mariadb.com/kb/en/library/information-schema-tables-table/

On this page (even if it is MariaDB, but the storage engine is the same) the descrtiption of data_lenght is the following:

For InnoDB/XtraDB, the index size, in pages, multiplied by the page size. For Aria and MyISAM, length of the data file, in bytes. For MEMORY, the approximate allocated memory.

Edit (some calculations)

16 KB = 16384 B

              Storage (B)   # of record   # of pages
                            on a page
---------------------------------------------------
varchar(255)  256           64            156.25
varchar(256)  258           63.5          158.73

As you see the raw data (with the length marker) can be stored on almost the same amount of pages.

Due to the fact that a page is not necessary filled to 100% (however innodb_fill_factor defaults to 100) and there is some overhead in the row structure, this little difference won't necessarily visible.

The database files are not like a csv file, but they have to handle multiple things such as NULL values, row size when it is varying, etc which takes up additional space.

More about the InnoDB Row Structure: https://dev.mysql.com/doc/refman/5.5/en/innodb-physical-record.html

Pred
  • 8,789
  • 3
  • 26
  • 46
  • Thank you for the explanation of `pages`, as well as how `data_length` and `index_length` was calculated; I found another link [here](https://dev.mysql.com/doc/refman/5.7/en/show-table-status.html) that I believe confirms what you mentioned. However, I am unsure how leveraging the `data_free` column will help get me closer to figuring out the actual amount of utilized bytes of each table. Regardless, it appears that both tables have an identical value for `data_free` (`4194304`) so still both tables appear to be consuming the same amount of space. – Freer Factor Nov 30 '17 at 03:48
  • It is also my understanding that since my theory is that there should be a difference of `20,000` bytes, that both of these tables should not be consuming the same amount of pages, since `20,000` bytes > `16` KB. – Freer Factor Nov 30 '17 at 03:49
  • I have added some calculations and further explanation. Please see the edit. – Pred Nov 30 '17 at 08:26