12
mysql> desc temp1;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| value | varchar(255) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+

mysql> desc temp2;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| value | text | YES  |     | NULL    |       |
+-------+------+------+-----+---------+-------+

255 - 'a' characters in each row(In both tables)

mysql> select * from temp1 limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

mysql> select * from temp2 limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Query table 1:

select count(*) from temp1 where value like '%a';

Query table 2:

select count(*) from temp2 where value like '%a';

Stats:

No of records---temp1(varchar)---temp2(text)


2097152---------6.08(sec)--------6.91(sec)          
4194304---------12.42(sec)-------13.66(sec)
8388608---------25.08(sec)-------28.03(sec)
16777216--------52.82(sec)-------56.88(sec)
33554432--------1(min)50.17(sec)-1(min)59.36(sec)

My question: How can the difference in execution speed be explained?

The rows contents are same in both tables.

As I understood VarChar and Text columns keep contents offPage only when it exceeds row size. So both tables contents will be inline data for my page size(16kb). Then what was the reason for this query execution time difference.

Note: Both table column is not indexed

Row Format - DYNAMIC

Collation - UTF8mb3

Character set - utf8_general_ci

Storage engine -  innodb

Mysql - 5.7

Reference link: https://stackoverflow.com/a/48301727/5431418

Update: Same flow now I tried with 5000 characters ('a') in both tables the result difference is high.

2097152---------1(min)53.63(sec)--------2(min)4.66(sec)    

Update 2: Same flow now I tried with 2 characters ('a') in both tables still there is a performance difference

Adding table status:

mysql> select * FROM information_schema.tables  WHERE table_schema = "db67006db" and table_name = 'temp1';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | db67006db    | temp1      | BASE TABLE | InnoDB |      10 | Dynamic    |   30625036 |            315 |  9659482112 |               0 |            0 | 425721856 |           NULL | 2019-09-23 20:20:17 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (0.01 sec)

mysql> select * FROM information_schema.tables  WHERE table_schema = "db67006db" and table_name = 'temp2';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | db67006db    | temp2      | BASE TABLE | InnoDB |      10 | Dynamic    |   30922268 |            315 |  9753853952 |               0 |            0 | 425721856 |           NULL | 2019-09-23 20:20:12 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
vinieth
  • 1,204
  • 3
  • 16
  • 34
  • did you try MySQL Workbench? – diginoise Sep 30 '19 at 11:16
  • see this answer: https://stackoverflow.com/questions/6628660/text-vs-varchar-in-innodb-mysql-5-5-when-to-use-each-one TL;DR: **TEXT** offloads temp tables to disk durring **SELECT** – diginoise Sep 30 '19 at 11:30
  • @diginoise for the given query - temp tables won't be created. Temporary tables will be used if these cases are there in query -> `UNION, derived tables, different orderBy and groupBy, groupConcat(), count(Distinct()), distinct combined with orderBy, orderBy or groupBy contains columns from tables other than the first table in the join.` https://dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.html – vinieth Sep 30 '19 at 11:56
  • See this https://stackoverflow.com/questions/2023481/mysql-large-varchar-vs-text and take time to read the answers and the comments as well. You might find your answer there – FanoFN Oct 01 '19 at 08:22
  • 2
    I have reproduced the test and got similar results. But what surprised me more, is that there is still a difference when the WHERE clause is omnited. – Paul Spiegel Oct 01 '19 at 16:15
  • @PaulSpiegel If WHERE condition is removed still there is a difference. The difference is based on the select column (TEXT vs VARCHAR). Query like below will make difference. `select VARCHARCOLUMN from temp1 limit 400000,1'` and `select TEXTCOLUMN from temp2 limit 400000,1'`. Here the text column will take more time. Even if both VARCHAR and TEXT column's contains only two 'a' characters. – vinieth Oct 02 '19 at 06:12
  • Please provide `SHOW TABLE STATUS`; – Rick James Oct 04 '19 at 04:29
  • @RickJames updated the table status – vinieth Oct 08 '19 at 06:49

4 Answers4

2

let's use some tools

Since the initial hunch (see below) was a miss, try running your query via MySQL Workbench in order to gather Query Performance Stats.


initial hunch (no result)

Just a thought:

  • TEXT column size on disk is 2 + N bytes where N is the length of the string
  • VARCHAR takes 1 + N bytes (for N ≤ 255) or 2 + N bytes (for 256 ≤ N ≤ 65535)

Try extending the size of the text in the column above 256 characters and re-run your tests. Potentially they will run with performance more closely matched.

Please also mind that the differences you post are expressed in microseconds per record, so there could be many OS events getting in the way or very simple if (TEXT) {do some additional IO or housekeeping} code path in the source.

diginoise
  • 7,352
  • 2
  • 31
  • 39
  • Update: Same flow now I tried with 5000 characters ('a') in both tables the result difference is high. ```2097152---------1(min)53.63(sec)--------2(min)4.66(sec)``` – vinieth Sep 23 '19 at 18:32
  • `VARCHAR(255)` with `utf8` requires 2 bytes to store the length. So "*VARCHAR takes 1 + N*" is not correct. – Paul Spiegel Sep 30 '19 at 11:14
  • All the timings (so far) are 10-15% difference. That's high enough to be suspicious, but much too high to be explained by a 1-byte difference in the length field. – Rick James Oct 04 '19 at 04:37
  • @RickJames the initial hunch about that 1 byte has been marked with **no result**. I had also suggested swapping of temp tables to disk. I still think that WORKBENCH could show something interesting. – diginoise Oct 04 '19 at 08:23
  • Provide `EXPLAIN FORMAT=JSON SELECT ...` and "optimizer trace" to see if there are any visible differences. – Rick James Oct 04 '19 at 17:58
  • @diginoise there isn't any temp tables created for this query. Couldn't find anything extra in WORKBENCH. As @Paul Spiegel told in the comment `I have reproduced the test and got similar results. But what surprised me more, is that there is still a difference when the WHERE clause is omitted.` Yes this difference is still there when WHERE is omitted and only select is used. Like this - `select VARCHARCOLUMN from temp1 limit 400000,1' and select TEXTCOLUMN from temp2 limit 400000,1` – vinieth Oct 08 '19 at 07:03
  • 1
    @vinieth Note that I have tested `SELECT COUNT(*)`, which shouldn't read any data column. – Paul Spiegel Oct 08 '19 at 16:54
2

With respect to storage, InnoDB will handle VARCHAR and TEXT much the same when both stored inline. However, when fetching the data from InnoDB, the server will allocate space for all VARCHAR columns before query execution. While space for TEXT columns will only be allocated if they are actually read, where DYNAMIC memory allocation takes time.

https://forums.mysql.com/read.php?24,645115,645164#msg-645164

vinieth
  • 1,204
  • 3
  • 16
  • 34
0

TEXT type will be always slower than VARCHAR because those types have different methods of storage. VARCHAR field stored in the table with all columns but TEXT stored differently. Each TEXT value is a separate object. It means if you want to do something with TEXT value MySQL will make additional operations to get that object.

Quote from the official documentation:

Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.

Maksym Fedorov
  • 6,383
  • 2
  • 11
  • 31
  • 1
    With respect to storage, InnoDB will handle VARCHAR and TEXT much the same. Check out this link https://forums.mysql.com/read.php?24,645115,645164#msg-645164 – vinieth Sep 30 '19 at 12:22
  • @vinieth This topic about In-Memory Structures of InnoDB, but before to move to memory the data is stored at disk – Maksym Fedorov Sep 30 '19 at 13:25
  • 1
    @oysteing - In https://forums.mysql.com/read.php?24,645115,645246#msg-645246 you point out a memory allocation difference -- but is that enough to explain 10-15% difference in speed? – Rick James Oct 04 '19 at 04:45
  • @RickJames As of now only thing i can point out for this difference is What oysteing told. – vinieth Oct 08 '19 at 06:58
0

Your first case assumption is not correct. Based on Storage Requirements TEXT stores one more byte for 255 a than VARCHAR so for 33554432 records in your table you need 33554432 more bytes to load in memory and explains the time delay.

That of course would not apply for 5000 a where based on the same documentation the size is the same L + 2 bytes. But i think the reason for that delay is described in Row Size Limits where it writes:

The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, even if the storage engine is capable of supporting larger rows. BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.

I think it is quite different to be part of the row data and to be stored separately(need some time to retrieve it from the location stored) and this explains the time delay.

  • 5000 characters(5000byte for my case) will be stored in inline until it exceeds the row size, where my row size 8126bytes. – vinieth Oct 08 '19 at 06:55
  • `BLOB` and `TEXT` are different for MySQL than `VARCHAR`. Even if number of bytes is the same, one is part of the row, while the other has a 9-12 footprint in row and the actual data are stored elsewhere. so you need one more logical "step" to fetch it. That is what makes the time difference. –  Oct 08 '19 at 23:52
  • @PeterDarmis - That depends on the `ROW_FORMAT`. And I think the value is a constant 20. – Rick James Oct 10 '19 at 04:05
  • @RickJames let's consider `DYNAMIC` and `utf8mb4` , the 9-12 part is taken from official MySQL documentation i did not made it up :), yet your comment is indeed highly accurate. If one takes a look at https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html he/she would read ... –  Oct 10 '19 at 10:21
  • @RickJames ... `For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes are only needed if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For an externally stored column, the 2-byte length indicates the length of the internally stored part plus the 20-byte pointer to the externally stored part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true length of the column.`. –  Oct 10 '19 at 11:47
  • @RickJames so basically the driver does the picking on what will be off-page. By the way i don't see what is described in the question in this fiddle (although the data-set is too small) http://sqlfiddle.com/#!9/cc2a4/2 `TEXT` appears quicker... –  Oct 10 '19 at 11:49
  • @PeterDarmis - I am suspicious of the official documentation because of the number of changes over the years and the number of different people involved. Some inconsistencies are very subtle. The "one or two bytes". As for the fiddle -- there is no indication of on/off-page, etc. And there should not be. – Rick James Oct 10 '19 at 15:15
  • @RickJames of course you can't find any actual proof of something in that fiddle. But i would think that if one was to test something like the question's real topic, then he/she should try creating tables with all different types of `ROW_FORMAT` and for different charsets. Only from the combination of those two you have a lot of tests to do. Yet i don't think that you will consistency in the time difference if any. –  Oct 10 '19 at 15:20
  • @PeterDarmis - The time difference was an eye-opener. It is a subtle difference that I did not know about. I distill it thus: "Don't use `TEXT` unless you need it." Closely related: "Never use `TINYTEXT`." – Rick James Oct 10 '19 at 15:25