0

I have a table with two simple columns and two mediumtext columns that looks like this:

create table page (
  id bigint auto_increment primary key,
  status tinyint not null,
  content mediumtext,
  screenshot mediumtext
) row_format=compressed;

The table stores a web page's entire source and an encoded screenshot with the former being as large as 7mb and the latter around 5mb (but the average value for both columns is about 500kb to 2mb).

The page table only has 50k records, which isn't much these days, but is about 20GB in size. When I tried to add a new simple column it took almost an hour:

alter table page add column comment varchar(255);

Meanwhile, when I add the same comment column to another table with 50k records and no text columns it happens within seconds.

Here's where I'm curious: I thought that text columns were more like pointers to the actual data, so adding a new column should not take long because we're not touching the text data. But given the long duration it's almost like we're restructuring the entire table, which is concerning because it'll make future DDL difficult. What might be happening in this case, and can I query for transactional, lock, or metadata to get more insight? I have innodb_file_per_table=on.

Another curiosity is I recall adding a new column to the same big table but it was a nearly instant operation. Assuming I remembered correctly, are there certain operations that restructure the entire table vs ones that do not?

J-Deq87
  • 101
  • 10
  • Is this on the InnoDB engine? – tadman Aug 17 '22 at 20:55
  • What happens if you don't compress your tables? By using that I assume performance isn't your primary goal. – stdunbar Aug 17 '22 at 20:55
  • is it really necessary to save all the web page and to have screenshot in the database, with small pictures and not a lot of them of, but savib 5 mb with binrys data looks like you should rethink your strategy – nbk Aug 17 '22 at 21:14
  • @tadman - this is on innodb. – J-Deq87 Aug 18 '22 at 14:34
  • @stdunbar - compression can be a mixed bag but I'll try without compression. In most of our workloads compression has helped by cutting down on IO overhead which was our bottleneck (we have plenty of CPU to compress/decompress to spare). – J-Deq87 Aug 18 '22 at 14:35
  • @nbk - There's some historical reasons around this that we can't fix just yet. What I'm mainly trying to figure out is how/why this table structure might be slow adding new columns. – J-Deq87 Aug 18 '22 at 14:36
  • the size per se is not a problem, it os the size of the row, as Text and bunary are save in the table, and so must get pulled from the harddrive and every select will pill hard, besides that it wirte tmeo data and more, so as you get bigger and bigger it wil get sower and slower. also a good read https://stackoverflow.com/questions/1276/how-big-can-a-mysql-database-get-before-performance-starts-to-degrade – nbk Aug 18 '22 at 14:42
  • @nbk - Unless you actually select the text column (or do `select *`) then the optimizer is smart enough to not pull it. But it would make sense that since text/binary are stored within the same table file, that certain DDL operations will force a painful restructuring across _all_ data. That's probably what's happening. My queries are fine but the DDL is slow. – J-Deq87 Aug 18 '22 at 15:35
  • aktering a table will force the database to check every row if it still fits, as you have text and binary doesn't make it easier, it will pull them all into considaration. as the change willoccir only one nobody is interested how long it takes – nbk Aug 18 '22 at 15:41
  • If this were a recurring problem, I would suggest building a 'parallel' table for your extra column(s). – Rick James Aug 18 '22 at 18:38
  • @J-Deq87 - Perhaps you meant DML, not DDL? Yes, `ALTER` is slow because in most cases it must uncompress and recompress to perform the task. A non-compressed table can do a variety of "instant" alters. – Rick James Aug 21 '22 at 04:14

1 Answers1

1

I think the answer is that the whole row (or whole 'block'?) is "compressed". Hence, adding a column to a compressed table requires uncompressing, adding the column, then re-compressing.

When I have bulky text, I prefer to compress the text in the client, then store it into a MEDIUMBLOB. This cuts down on network traffic and server CPU effort. (Granted, the CPU is not an issue for you.)

It also may give better compression:

  • InnoDB's compression gives about 2:1
  • Virtually any text compressed with any of the popular compression tools gives about 3:1 compression.

Yes, the list of which ALTERs are faster or slow is very much a function of the layout of the data structures. For example, almost any change to the PRIMARY KEY requires a full rebuild (copy table over), but many non-Unique secondary index changes are "instant" (though some work may happen after the Alter returns).

Also, the Version matters. Certain ENUM changes were implemented long ago. The rest came in various stages with 5.6, 5.7, and 8.0. Occasionally someone comes up with another clever idea and yet another Alter becomes faster.

A general rule:

  • If the Alter can be effected by changing just the table definition, it may be "instant".
  • If the bytes on disk need to be changed or simply rearranged, it may involve copying over the entire table and/or rebuilding the index(es). (Of course, this requires you to know the internal implementation. Still, some of this can be easily guessed; or ask here for help.)
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Also worth mentioning that MySQL's compression might perform more poorly than native filesystem compression (e.g. ZFS) so if you have the option to do that, skip the MySQL compression. It will only get in the way. – tadman Aug 18 '22 at 18:53
  • @tadman - What does ZFS do? Leave a lot of holes? Or have some mapping? – Rick James Aug 18 '22 at 20:59
  • It compresses at the block level, which can lead to substantial performance improvements. CPUs are way faster than the IO bandwidth to the drive, so compressing data means less data is read/written, giving you more bang for your buck. ZFS also allows you to pick which compression method to use, which means you can elect for faster (if CPU load is a concern) vs. better compression. – tadman Aug 18 '22 at 21:05
  • @tadman - "block" = 4KB? 16KB? If it shrinks 16KB to 5KB, does it 'waste' the other 11KB (else the addressing would be messed up)? – Rick James Aug 19 '22 at 01:38
  • I'm not intimately familiar with the internals of ZFS but they are documented if you're curious. This isn't the first filesystem with built-in compression. Even [DOS had it in 1993](https://en.wikipedia.org/wiki/DriveSpace), and NTFS still has it to this day. For certain kinds of workloads, compression can provide huge gains. For others, like storing media assets that are already compressed (e.g. video), it's a waste of time. This is why it can be configured for each ZFS dataset individually. – tadman Aug 20 '22 at 22:19
  • @RickJames - I didn't think about decompression of the rows in order to do the DDL. Should have been more obvious but skipped my mind. Thanks for the post. I'm testing app-side compression as an alternative. – J-Deq87 Aug 23 '22 at 17:30