Adding columns to existing redshift table

Question

I have a database which contains more than 30m records, and I need to add two new columns to the database. The problem is that I need these columns to be NOT NULL, and without a default value. I thought that I would just add these columns without the NOT NULL constraint, fill them with data, then add the constraint, but Redshift doesn't support that. I have an other solution in my mind, but I wonder if there is any more simpler solution than this?

Create the two new columns with NOT NULL and DEFAULT
Filling the columns with data
Creating an empty table with the same columns as the target DB. (Of course the two new columns would be just NOT NULL)
Inserting everything from the target DB to the new DB.
Dropping the target DB
Renaming the new DB to the target.

That sounds like the right solution. – Jon Scott Feb 26 '19 at 14:54 — Jon Scott, Feb 26 '19 at 14:54

score 0 · Accepted Answer · answered Feb 26 '19 at 23:43

I would suggest:

Existing Table-A
Create a new Table-B that contains the new columns, plus an identity column (eg customer_id) that matches Table-A.
Insert data into Table-B (2 columns + identity column)
Use CREATE TABLE AS to simultaneously create a new Table-C (specifying DISTKEY and SORTKEY) while querying Table-A and Table-B via a JOIN on the identity column
Verify contents of Table-C
VACCUM Table-C (shouldn't be necessary, but just in case, and it should be quick)
Delete Table-A and Table-B
Rename Table-C to desired table name (which was probably the same as Table-A)

In Summary: Existing columns in Table-A + Extra columns in Table-B ➞ Table-C

Reasoning:

UPDATE statements do not run very well in Redshift. It requires marking existing data rows for each column as 'deleted', then appending new rows to the end of each column. Doing lots of UPDATES will blow-out the size of a table and it will become unsorted. It's also relatively slow. You would need to Deep Copy or VACUUM the table afterwards to fix things.
Using CREATE TABLE AS with a JOIN will generate all "final state" data in one query and the resulting table will be sorted and in a 'clean' state
The process gives you a chance to verify the content of Table-C before committing to the switchover. Very handy for debugging the process!

See also: Performing a Deep Copy - Amazon Redshift

I have only one issue with this answer, if we create the table with create table as, then the columns for TABLE C will be nullable. Also the foreign and primary keys are not generated automatically in this way. — Jázon Pánczél, Feb 27 '19 at 16:50
Okay, then create `Table-C` with a normal `CREATE TABLE` command (specifying all keys, etc), then insert into it using `INSERT INTO`. — John Rotenstein, Feb 27 '19 at 21:32

Adding columns to existing redshift table

1 Answers1

Linked