2

I was wondering what happens to the binlog when run an alter using pt-online-schema-change or gh-ost?

for the pt-online-schema-change I have read that it copies the table and use some triggers to apply the changes. I don't know if it create a table from the beginning with the new schema or it just apply the alter after copying the table?

if it alters the table from the beginning, then what happens to binglog? is the positions different than the previous binglog?

joe gates
  • 459
  • 1
  • 5
  • 15

1 Answers1

1

pt-online-schema change copies the table structure and applies the desired ALTER TABLE to the zero-row table. This is virtually instantaneous. Then it creates triggers to mirror changes against the original table. Then it starts copying old data from the original table to the new table.

What happens to the binlog? It gets quite huge. The CREATE TABLE and ALTER TABLE and CREATE TRIGGER are pretty small. DDL is always statement-based in the binlog. The DML changes created by the triggers and the process of copying old data become transactions in the binlog. We prefer row-based binlogs, so these end up being pretty bulky.

gh-ost is similar, but without the triggers. gh-ost reads the binlog to find events that applied to the old table, and it applies those to the new table. Meanwhile, it also copies old data. Together these actions result in a similar volume of extra events in the binlog as occur when using pt-online-schema-change.

So you should check the amount of free disk space before you begin either of these online schema change operations. It will expand the binlogs approximately in proportion to the amount of data to be copied. And of course you need to store two copies of the whole table — the original and the altered version — temporarily, until the original table can be dropped at the end of the process.

I have had to run pt-online-schema change on large tables (500GB+) when I had a disk that was close to being full. It causes some tense moments. I had to PURGE BINARY LOGS periodically to get some more free space, because the schema change would fill the disk to 100% if I didn't! This is not a situation I recommend.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you for the answer @Bill. Regarding the offset at binglog position I'm curious to know when the alter is done we are gonna have a totally new binglog? because we have connected the debezium connector but I want to know if it's gonna be affected – joe gates Feb 22 '22 at 08:08
  • 1
    Just like any other writes, MySQL will append them to the current binlog until it exceeds `max_binlog_size`, then it will start a new file. It will also start a new file if mysqld restarts, or someone executes `FLUSH LOGS`. Debezium better be able to handle new binlog files automatically, shouldn't it? – Bill Karwin Feb 22 '22 at 16:29
  • Thank you Bill. and the last question is how replication can work when we can purge some binlogs. how do they work when we don't have the full history of binglog – joe gates Feb 22 '22 at 19:40
  • 3
    They don't. If you purge binlogs that the replica has not downloaded yet, then that replica cannot catch up. You must destroy the replica and initialize a new replica. At my last job, we typically did this several times per week. So if you can avoid it, don't purge binlogs that are still needed by the replicas. – Bill Karwin Feb 22 '22 at 20:13