The updates in Redshift are slow because an update is a sequence of operations performed in a transaction:
- Select rows to be updated into a temp table
- Delete those rows
- Update those rows in the temp table according to the update condition
- Append the updated rows to the original table
And all that has to be coordinated across nodes.
Updating a single row may take as long as updated 1000 rows. What's worse is that because updates are so long and require write locks, they block the queries for long time significantly affecting overall system performance.
There are 3 ways to make it faster (all from experience):
Avoid updates.
If you have a condition which can allow you to distinguish the new rows from the old ones, just append the new ones to the table, and modify your queries with that condition. You'd be surprised to see that Redshift is performing faster - even though each query might have become a bit more complex, because there are no updates which overload the system, those queries may run faster (make sure the dist keys are right).
For example, a condition on maximum timestamp per business key suprisingly runs very fast (especially if you business key is your dist key - it's all going to run in parallel).
This is the preferable solution.
Perform updates in batches.
If your update is applicable to a range of rows, update them all at once with a where condition. Batches of 1000 work well, though your mileage may vary.
Create a table into which you store "new" rows, and then make an update using the join after that table is at least 1000 rows large.