2

I have a database which contains more than 30m records, and I need to add two new columns to the database. The problem is that I need these columns to be NOT NULL, and without a default value. I thought that I would just add these columns without the NOT NULL constraint, fill them with data, then add the constraint, but Redshift doesn't support that. I have an other solution in my mind, but I wonder if there is any more simpler solution than this?

  1. Create the two new columns with NOT NULL and DEFAULT
  2. Filling the columns with data
  3. Creating an empty table with the same columns as the target DB. (Of course the two new columns would be just NOT NULL)
  4. Inserting everything from the target DB to the new DB.
  5. Dropping the target DB
  6. Renaming the new DB to the target.
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470

1 Answers1

0

I would suggest:

  • Existing Table-A
  • Create a new Table-B that contains the new columns, plus an identity column (eg customer_id) that matches Table-A.
  • Insert data into Table-B (2 columns + identity column)
  • Use CREATE TABLE AS to simultaneously create a new Table-C (specifying DISTKEY and SORTKEY) while querying Table-A and Table-B via a JOIN on the identity column
  • Verify contents of Table-C
  • VACCUM Table-C (shouldn't be necessary, but just in case, and it should be quick)
  • Delete Table-A and Table-B
  • Rename Table-C to desired table name (which was probably the same as Table-A)

In Summary: Existing columns in Table-A + Extra columns in Table-BTable-C

Reasoning:

  • UPDATE statements do not run very well in Redshift. It requires marking existing data rows for each column as 'deleted', then appending new rows to the end of each column. Doing lots of UPDATES will blow-out the size of a table and it will become unsorted. It's also relatively slow. You would need to Deep Copy or VACUUM the table afterwards to fix things.
  • Using CREATE TABLE AS with a JOIN will generate all "final state" data in one query and the resulting table will be sorted and in a 'clean' state
  • The process gives you a chance to verify the content of Table-C before committing to the switchover. Very handy for debugging the process!

See also: Performing a Deep Copy - Amazon Redshift

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470