We are trying to setup dev
and qa
environments using data from a prod
environment.
We are not using CREATE DATABASE dev CLONE prod
because we are trying to avoid cloning database-specific objects like stages
and pipes
, since we are using per-environment Terraform to manage pipe-related objects and want to avoid out-of-band changes to those objects.
On top of that, there are some tables that should not be cloned from prod->dev. I'm trying to design a cleaner solution than the cobbled mess that we have.
We have a scheduled script that does the following:
- Connect to
prod
anddev
databases and fetch the rightsrc
anddst
schemas - Run
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '<>' AND TABLE_TYPE = 'BASE TABLE'
to get tables to clone- Cloning tables across databases results in dangling references to constraints and sequences, so those have to be manually cloned https://docs.snowflake.com/en/user-guide/database-replication-considerations.html?#references-to-objects-in-another-database
- For each table:
- If it shouldn't be cloned, skip it
- Run
CREATE OR REPLACE TABLE <dev> CLONE <prod> COPY GRANTS;
- Run
GET_DDL(<dev>)
to see if the table has sequences/constraints to update - Run
CREATE OR REPLACE SEQUENCE <dev> CLONE <prod>
to update thenextval
of the sequence since our table was cloned and references the sequence from the source database (and it also has the wrong value anyways) - Run
ALTER TABLE <dev> ALTER COLUMN <> SET DEFAULT <new seq>.nextval
- Check if there are constraints
- Run
ALTER TABLE <dev> DROP CONSTRAINT <>
since the cloned tables reference the source database - Run
ALTER TABLE <dev> ADD CONSTRAINT <>
to rebuild them to reference the destination database
- Run
So... it works, but it's hacky, fragile, and prone to updating because of custom rules. We currently have this running on an AWS lambda, but a first step would be to migrate this to pure Snowflake.
Does anyone have any suggestions to improve this process? Or at least have recommendations on Snowflake tools that