6

I'm experimenting with improving the performance of a certain table in my company's database. This table has 7.9 6.9 million rows with the format:

mysql> show fields from BroadcastLog;
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| id            | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| broadcast_id  | int(10) unsigned | YES  | MUL | NULL    |                |
| author_id     | int(10) unsigned | YES  | MUL | NULL    |                |
| type          | int(11)          | NO   | MUL | NULL    |                |
| origin        | int(11)          | YES  | MUL | NULL    |                |
| date_created  | datetime         | NO   | MUL | NULL    |                |
| date_modified | datetime         | NO   |     | NULL    |                |
| old_status    | int(10) unsigned | YES  | MUL | NULL    |                |
| new_status    | int(10) unsigned | YES  | MUL | NULL    |                |
| json_data     | text             | YES  |     | NULL    |                |
| log_text      | text             | YES  |     | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+
11 rows in set (0.01 sec)

One of the first places I wanted to look to improve this was changing the two text fields to varchar fields which I know to generally be more efficient. So I gave it a try:

mysql> alter table BroadcastLog modify log_text varchar(2048);
Query OK, 0 rows affected, 1 warning (1 min 13.08 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show warnings;
+-------+------+---------------------------------------------------+
| Level | Code | Message                                           |
+-------+------+---------------------------------------------------+
| Note  | 1246 | Converting column 'log_text' from VARCHAR to TEXT |
+-------+------+---------------------------------------------------+
1 row in set (0.01 sec)

It didn't convert!

I tried to get clever. Let's create a new (temporary) column, copy the data, drop the old column, then rename the new one:

mysql> alter table BroadcastLog add column log_text_vc varchar(2048);
Query OK, 0 rows affected, 1 warning (1 min 13.08 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show warnings;
+-------+------+---------------------------------------------------+
| Level | Code | Message                                           |
+-------+------+---------------------------------------------------+
| Note  | 1246 | Converting column 'log_text' from VARCHAR to TEXT |
+-------+------+---------------------------------------------------+
1 row in set (0.01 sec)

Couldn't even create a new column!

I tried to get clever-er. Create a new table, copy the data, drop the old columns, copy the data back:

mysql> create table tmp (id INT UNSIGNED PRIMARY KEY, json_data VARCHAR(1024), log_text VARCHAR(2048));
Query OK, 0 rows affected (0.04 sec)

mysql> insert into tmp (id, json_data, log_text) select id, json_data, log_text from BroadcastLog;
Query OK, 6939076 rows affected (5 min 28.12 sec)
Records: 6939076  Duplicates: 0  Warnings: 0

mysql> alter table BroadcastLog drop column json_data;
Query OK, 0 rows affected (1 min 12.36 sec)
Records: 0  Duplicates:  0 Warnings: 0

mysql> alter table BroadcastLog drop column log_text;
Query OK, 0 rows affected (1 min 9.10 sec)
Records: 0  Duplicates:  0 Warnings: 0

mysql> alter table BroadcastLog add column json_data varchar(1024);
Query OK, 0 rows affected (1 min 11.52 sec)
Records: 0  Duplicates:  0 Warnings: 0

mysql> alter table BroadcastLog add column log_text varchar(2048);
Query OK, 0 rows affected (1 min 15.41 sec)
Records: 0  Duplicates:  0 Warnings: 1

mysql> show warnings;
+-------+------+---------------------------------------------------+
| Level | Code | Message                                           |
+-------+------+---------------------------------------------------+
| Note  | 1246 | Converting column 'log_text' from VARCHAR to TEXT |
+-------+------+---------------------------------------------------+
1 row in set (0.01 sec)

mysql> show fields from BroadcastLog;
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| id            | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| broadcast_id  | int(10) unsigned | YES  | MUL | NULL    |                |
| author_id     | int(10) unsigned | YES  | MUL | NULL    |                |
| type          | int(11)          | NO   | MUL | NULL    |                |
| origin        | int(11)          | YES  | MUL | NULL    |                |
| date_created  | datetime         | NO   | MUL | NULL    |                |
| date_modified | datetime         | NO   |     | NULL    |                |
| old_status    | int(10) unsigned | YES  | MUL | NULL    |                |
| new_status    | int(10) unsigned | YES  | MUL | NULL    |                |
| json_data     | varchar(1024)    | YES  |     | NULL    |                |
| log_text      | mediumtext       | YES  |     | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+
11 rows in set (0.01 sec)

So one field was created properly, but the other was still converted to TEXT, despite the field being completely empty with no data in it

I've been Googling around to try and find an answer, but thus far I've turned up nothing.

Create Table Statement

Per the comments, here's the create table statement (after my above changes, so the log_text column and json_data column on my local database may not match the original data I pulled from our production database this morning):

mysql> show create table BroadcastLog\G
*************************** 1. row ***************************
       Table: BroadcastLog
Create Table: CREATE TABLE `BroadcastLog` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `broadcast_id` int(10) unsigned DEFAULT NULL,
  `author_id` int(10) unsigned DEFAULT NULL,
  `type` int(11) NOT NULL,
  `origin` int(11) DEFAULT NULL,
  `date_created` datetime NOT NULL,
  `date_modified` datetime NOT NULL,
  `old_status` int(10) unsigned DEFAULT NULL,
  `new_status` int(10) unsigned DEFAULT NULL,
  `log_text` mediumtext,
  PRIMARY KEY (`id`),
  KEY `old_status` (`old_status`),
  KEY `new_status` (`new_status`),
  KEY `broadcast_id` (`broadcast_id`),
  KEY `author_id` (`author_id`),
  KEY `log_type_and_origin` (`type`,`origin`),
  KEY `log_origin` (`origin`),
  KEY `bl_date_created` (`date_created`),
  CONSTRAINT `fk_BroadcastLog_author_id` FOREIGN KEY (`author_id`) REFERENCES `User` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
  CONSTRAINT `fk_BroadcastLog_broadcast_id` FOREIGN KEY (`broadcast_id`) REFERENCES `Broadcast` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=6941898 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

MySQL Version

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.31    |
+-----------+
1 row in set (0.01 sec)

Updated

I updated MySQL and got the same results:

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.23    |
+-----------+
1 row in set (0.00 sec)

mysql> show fields from BroadcastLog;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int unsigned | NO   | PRI | NULL    | auto_increment |
| broadcast_id  | int unsigned | YES  | MUL | NULL    |                |
| author_id     | int unsigned | YES  | MUL | NULL    |                |
| type          | int          | NO   | MUL | NULL    |                |
| origin        | int          | YES  | MUL | NULL    |                |
| date_created  | datetime     | NO   |     | NULL    |                |
| date_modified | datetime     | NO   |     | NULL    |                |
| log_text      | text         | NO   |     | NULL    |                |
| json_data     | text         | YES  |     | NULL    |                |
| old_status    | int unsigned | YES  | MUL | NULL    |                |
| new_status    | int unsigned | YES  | MUL | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
11 rows in set (0.00 sec)

mysql> alter table BroadcastLog modify log_text varchar(2048);
Query OK, 6939076 rows affected, 1 warning (3 min 22.64 sec)
Records: 6939076  Duplicates: 0  Warnings: 1

mysql> show warnings;
+-------+------+---------------------------------------------------+
| Level | Code | Message                                           |
+-------+------+---------------------------------------------------+
| Note  | 1246 | Converting column 'log_text' from VARCHAR to TEXT |
+-------+------+---------------------------------------------------+
1 row in set (0.01 sec)

mysql> show fields from BroadcastLog;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int unsigned | NO   | PRI | NULL    | auto_increment |
| broadcast_id  | int unsigned | YES  | MUL | NULL    |                |
| author_id     | int unsigned | YES  | MUL | NULL    |                |
| type          | int          | NO   | MUL | NULL    |                |
| origin        | int          | YES  | MUL | NULL    |                |
| date_created  | datetime     | NO   |     | NULL    |                |
| date_modified | datetime     | NO   |     | NULL    |                |
| log_text      | mediumtext   | YES  |     | NULL    |                |
| json_data     | text         | YES  |     | NULL    |                |
| old_status    | int unsigned | YES  | MUL | NULL    |                |
| new_status    | int unsigned | YES  | MUL | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
11 rows in set (0.02 sec)

I will note one difference I saw in the output, which could have an alternate explanation: it now asys "6939076 rows affected" instead of "0 rows affected". Although I spent a couple hours trying to make sense of this behavior so I had already run multiple ALTER TABLE statements before I even started this SO thread. It's possible you only get rows affected the first time you try to change the column and I just missed it. It's also possible MySQL 8 just uses a different metric for "affected rows" and has different output.

Either way, still not converting to VARCHAR for some reason

stevendesu
  • 15,753
  • 22
  • 105
  • 182
  • Can you show the full `SHOW CREATE TABLE` version and not the `DESCRIBE` summary? Tip: Use `\G` to get clean output. If this is a MyISAM table, that might explain a lot. – tadman Mar 02 '21 at 21:41
  • I updated the post with my CREATE TABLE statement. It's an InnoDB table, not MyISAM. – stevendesu Mar 02 '21 at 21:52
  • Next logical question: What MySQL version? Support for >255 characters is a "new" thing, so if you're using something like 5.6 or older it might not work. – tadman Mar 02 '21 at 21:52
  • @tadman Updated again with the MySQL version. It's 5.7.31 – stevendesu Mar 02 '21 at 21:53
  • Not sure if you can do a quick test on 8.0 but that quirk may have been resolved. I've never used >255 on MySQL because it's had a history of being fussy about it. I can only test here on 8.0 and it seems fine. – tadman Mar 02 '21 at 21:55
  • @tadman I can test out MySQL 8.0 tomorrow morning. Work day is ending :/ But the database is running in Docker so I just need to spin it up with a different image and then re-import all of the data (which takes about 20 minutes) – stevendesu Mar 02 '21 at 21:59
  • Check out this one https://stackoverflow.com/questions/332798/equivalent-of-varcharmax-in-mysql If you used a mult-byte character set such as UTF-8 then may be your data won't fit into a varchar column. But it may fit into a text column. – Onur Baştürk Mar 02 '21 at 22:25
  • @tadman MySQL VARCHAR has supported up to 65,535 bytes since version 5.0.3 (2005-03-23). That's "new"? Cf. https://docs.oracle.com/cd/E17952_01/mysql-5.0-relnotes-en/news-5-0-3.html – Bill Karwin Mar 02 '21 at 22:51
  • @BillKarwin Ah, you're right. I'm probably thinking of the issues with UTF-8 handling on indexed large columns where the 768 character limit kicks in. – tadman Mar 03 '21 at 00:58
  • 1
    @tadman I've tested with MySQL 8.0 this morning -- still can't convert to `VARCHAR`. No explanation as to why – stevendesu Mar 03 '21 at 15:52

2 Answers2

5

After some further experimentation and research, I figured out why MySQL wasn't allowing me to convert these field types. The issue for my specific case of only trying to make a VARCHAR(2048) is caused by a poorly configured MySQL instance, but the issue in general could apply to anyone trying to make a VARCHAR(16378) or larger with default configuration.

The main difference between VARCHAR is TEXT is that VARCHAR is stored in the same file on disk as all of the other data in the table whereas TEXT is stored in a separate file elsewhere on disk. This means that when you read a page from disk you are implicitly reading the data for VARCHAR fields, but the data for TEXT fields is only read explicitly if you request those fields (e.g. via SELECT * or naming the field in your SELECT statement). This is the source of the performance improvement of VARCHAR over TEXT.

Because VARCHAR is stored in the table file on disk, a field can only be set to a VARCHAR if the field would fit in the page file. While MySQL's documentation claims that VARCHAR fields have a maximum length of 65535, it does not enforce page files to be 65535 bytes long. In fact, that's not even the default:

https://dev.mysql.com/doc/refman/8.0/en/innodb-init-startup-configuration.html

A minimum file size is enforced for the first system tablespace data file to ensure that there is enough space for doublewrite buffer pages. The following table shows minimum file sizes for each InnoDB page size. The default InnoDB page size is 16384 (16KB). [emphasis mine]

Without modifying your page size, you can't make a VARCHAR field larger than 16384 (actually, you can't make a field larger than 16377, because of how VARCHAR is stored on disk: 4 bytes for a pointer, 2 bytes for a length, 1 byte boolean to declare whether or not it's null, then the actual data is stored in the "variable length" portion of the page)

My problem came from the fact that we have much smaller page sizes configured.

Conclusion

If you ever try to make a VARCHAR field and MySQL automatically converts it to a TEXT field, check your InnoDB page size:

SHOW VARIABLES LIKE "%page_size%";

You're probably trying to make a field that won't fit on the page.

Unfortunately, changing your page size isn't straight-forward. You have to create a whole new database and copy the data over.

stevendesu
  • 15,753
  • 22
  • 105
  • 182
  • How big is the page size in your case for your tables/databases? When you sum up the sized from the other columns, do you come near your page size? – Progman Mar 22 '21 at 17:46
  • @Progman My database was configured with a 2 KB page size. – stevendesu Mar 22 '21 at 17:50
-3

Smaller VARCHARs have some obscure advantages over TEXT and especially TINYTEXT.

But, since you what 2K characters, there is no advantage of VARCHAR(2048) over TEXT except for one thing -- complaining if you try to insert more than 2048 characters.

DESCRIBE seems to say that log_text is TEXT. But the SHOW CREATE TABLE (which I trust more) says it is MEDIUMTEXT (with a limit of 16MBytes).

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • 1
    It was originally `TEXT` but when I tried to convert it to a `VARCHAR(2048)` MySQL auto-converted it to a `MEDIUMTEXT`. So `DESCRIBE` also shows `MEDIUMTEXT` now. As far as advantages, I operate on the general rule of thumb that "CHAR is faster than VARCHAR which is faster than TEXT" but I always confirm my assumptions through experiment (which I'm trying to do now). When I Google'd it I read that VARCHAR is strictly more performant, up to the maximum of 65535 characters. That said, your "answer" doesn't address my question at all - WHY is MySQL not letting me change the field type? – stevendesu Mar 03 '21 at 13:21
  • `CHAR` was faster than `VARCHAR` in very limited situations when using MyISAM. That is, that is an "old wives' tale". `VARCHAR` is faster than `TEXT` in a few situations, but only for smaller values. I will argue that those Google sites are wrong about "strictly more performant". – Rick James Mar 03 '21 at 17:51
  • As for "why" did it change it? I don't know the answer [yet]. 2048 --> MEDIUM should not have a measurable difference. It does lose the test for "max 2048". – Rick James Mar 03 '21 at 17:52
  • 1
    I see that I am getting dinged repeatedly about this Answer. But I stand by what I say. And I welcome targeted criticism. – Rick James Mar 04 '21 at 00:15
  • 1
    I personally downvoted the answer because it doesn't **answer** anything. It provides a small bit of trivia ("there is no advantage of `VARCHAR(2048)` over `TEXT` except for one thing -- complaining if you try to insert more than 2048 *characters*"), but doesn't address the question I asked: **why** is MySQL not letting me change the field type? If your post doesn't answer the question it's better suited to a comment than an answer. – stevendesu Mar 04 '21 at 14:49
  • with regards to `VARCHAR` and `TEXT` being indistinguishable, I did some further research in the MySQL docs. It looks like LOGICALLY they're identical, but the implementation does vary pretty significantly. `VARCHAR` fields are stored in the variable-length portion of the table pages, so they're loaded from disk with the table. `TEXT` fields are stored off-table in a separate file. So if you're frequently selecting the field (or querying by it), `VARCHAR` will be faster – stevendesu Mar 04 '21 at 21:01