1

I have a very large table that I want to add a timestamp column to. It is a table that gets a lot more updates than inserts. I am trying to figure out a way to do this without taking the table out of production for a significant amount of time and it has me in knots. I can do:

alter table stuff add column mod_time timestamp;

Well, I could do this and then the table is locked for 3-5 hours. Not a happy time for users.

For adding a varchar column, for example, I could create a new table, add the column, copy the data over and then replace the old table with the new. This last switch can be done in just a second or two during a slow-ish time. No problems from that. Of course, I may need to do the copy in stages. I can do one copy, and then do a copy of everything that changed after the first copy started. I can repeat this until nothing has changed and do the swap. Of course, it is much easier to determine what changed if I have a "mod_time" column.

What I really would rather to do is: (1) create the new table, (2) copy the data, (3) add the timestamp column, (4) swap the tables.

But step (3) takes, again, hours.

I want rows updated before this column was added to have a mod_time value set to NULL.

If I switch (2) and (3) in the step above, the swap is doable, but I get a mod_time = when I did this stuff to add the column. I want NULL.

I can try to set the values to NULL just before I do the swap, but of course if you update the row to set it to NULL, you are updating the row and the mod_time column gets set to the current time. :-)

I wish I could do: (1) create the new table, (2) add "mod_time" as a datetime column (2) copy the data, (3) change mod_time to a timestamp column, (4) swap the tables.

Theoretically (3) could be done very very quickly (assuming the storage for datetime and timestamp are compatible) because all I am really doing in changing the column to tmestamp is changing its behavior in the future, not its current storage. So this should take no time at all, yes? Probably not.

Just to clarify, if I do:

 alter table stuff add column mod_time timestamp;

I get:

 +----------+--------+----------+---------------------+
 |     col1 |   col2 |     col3 | mod_time            |
 +----------+----- --+----------+---------------------+
 |     5001 |     50 |     2463 | 0000-00-00 00:00:00 |
 |     5002 |     50 |     2467 | 0000-00-00 00:00:00 |
 |     5003 |     50 |     2459 | 0000-00-00 00:00:00 |

This is what I want. It is just that the alter statement will take too long. Everything else I am trying gives me:

 +----------+--------+----------+---------------------+
 |     col1 |   col2 |     col3 | mod_time            |
 +----------+----- --+----------+---------------------+
 |     5001 |     50 |     2463 | 2013-12-05 18:11:21 |
 |     5002 |     50 |     2467 | 2013-12-05 18:11:21 |
 |     5003 |     50 |     2459 | 2013-12-05 18:11:21 |
Ray Kiddy
  • 3,521
  • 3
  • 25
  • 32
  • What kind of usage pattern does your `stuff` table have? Does it have lots of INSERTs, UPDATEs, and DELETEs? Is the timestamp to be the INSERT timestamp? Or will you do a lot of updates on the timestamp? – O. Jones Dec 05 '13 at 22:56
  • Relatively few creates, lots of updates. – Ray Kiddy Dec 05 '13 at 23:05

2 Answers2

0

You should use pt-online-schema-change, a tool for automating ALTER TABLE operations for MySQL, without locking the table. It's part of the Percona Toolkit, a free collection of indispensable helper tools for MySQL developers and DBA's.

$ pt-onlines-chema-change h=localhost,D=mydatabase,t=mytable \
  --alter="add column mod_time timestamp"

You can continue reading and writing the original table while pt-online-schema-change is working. When it's done, it automatically swaps the tables' names, and drops the original.

You can view a recording of a webinar about this tool by its designer. Viewing is free, but requires registration: http://www.percona.com/webinars/2012-05-02-zero-downtime-schema-changes-in-mysql

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Well, that works except that it does the thing where the rows mod_time values are all set to the mod time and not left NULL. – Ray Kiddy Dec 06 '13 at 02:05
  • I suppose I can do an --alter="add column mod_time datetime" and then an --alter="change column mod_time mod_time timestamp". Since neither one is blocking, this would be acceptable. – Ray Kiddy Dec 06 '13 at 02:07
  • Nope. Still have non-NULL values in the mod_time column. Well, for alter table operations where we are not modifying or adding a timestamp column, this tool is sweet. – Ray Kiddy Dec 06 '13 at 02:13
  • You should read about the default behavior of TIMESTAMP columns. This behavior can be overridden so they stay NULL when you add a column. http://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html – Bill Karwin Dec 06 '13 at 05:18
  • 1
    Wow. That is just not very clear. It seems that "alter table stuff add column mod_time TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP;" would work but it does not. One needs to do "alter table stuff add column mod_time TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;" to get the behavior that I want. Well, at least that works! – Ray Kiddy Dec 06 '13 at 20:51
0

How about something like:

CREATE TABLE Stuff_new LIKE Stuff;
ALTER TABLE Stuff_new ADD COLUMN mod_time TIMESTAMP NULL; -- NULL important to get column to default to NULL, not current timestamp
DELIMITER $$
CREATE TRIGGER ON Stuff_AI AFTER INSERT ON Stuff FOR EACH ROW
BEGIN
    INSERT INTO Stuff_new (c1, c2, ..., mod_time) VALUES (NEW.c1, NEW.c2, ..., NULL);
END $$
CREATE TRIGGER ON Stuff_AU AFTER UPDATE ON Stuff FOR EACH ROW
BEGIN
    INSERT INTO Stuff_new (c1,c2,...) VALUES (NEW.c1, NEW.c2, ...)
      ON DUPLICATE KEY UPDATE c1=NEW.c1, c2=NEW.c2, ..., mod_time=NULL;
END $$
CREATE TRIGGER ON Stuff_AD AFTER DELETE ON Stuff FOR EACH ROW
BEGIN
    DELETE FROM Stuff_new WHERE c1=OLD.c1;
END $$
DELIMITER ;

Basically, you're creating the new table structure in a parallel table "Stuff_new", then using triggers on Stuff to replicate the data into Stuff_new as changes are made on Stuff. After time passes, Stuff_new should approximate Stuff enough where you can do a proper copy of missing data over.

Caveat emptor: it's late, I'm tired... this is just a brainstorm! Use at own risk. :)

(Oh, and this might be helpful: ALTER TABLE ADD COLUMN takes a long time)

Community
  • 1
  • 1
bishop
  • 37,830
  • 11
  • 104
  • 139
  • Just a brainstorm and you have basically re-invented mysql replication here. :-) And, yes, a bit complex. I find cold medicine is helping me with the sleeping thing. – Ray Kiddy Dec 06 '13 at 02:24
  • 1
    He's basically re-invented pt-online-schema-change. :-) – Bill Karwin Dec 06 '13 at 05:22