We've been adding comments to the columns in postgres as column descriptions. Similarly, there are descriptions in dbt that can be written.
How would I go about writing SQL to automatically setting the same descriptions in postgres into dbt docs?
We've been adding comments to the columns in postgres as column descriptions. Similarly, there are descriptions in dbt that can be written.
How would I go about writing SQL to automatically setting the same descriptions in postgres into dbt docs?
Here's how I often do it.
Take a look at this answer on how to pull descriptions from the pg.catalog.
From there, you want to write a BQ query that generates a json which you can then convert to a yaml file you can use directly in dbt.
BQ link - save results as JSON file.
Use a json2yaml tool.
Save yaml file to an appropriate place in your project tree.
Code sample:
-- intended to be saved as JSON and converted to YAML
-- ex. cat script_job_id_1.json | python3 json2yaml.py | tee schema.yml
-- version will be created as version:'2' . Remove quotes after conversion
DECLARE database STRING;
DECLARE dataset STRING;
DECLARE dataset_desc STRING;
DECLARE source_qry STRING;
SET database = "bigquery-public-data";
SET dataset = "census_bureau_acs";
SET dataset_desc = "";
SET source_qry = CONCAT('''CREATE OR REPLACE TEMP TABLE tt_master_table AS ''',
'''(''',
'''SELECT cfp.table_name, ''',
'''cfp.column_name, ''',
'''cfp.description, ''',
'''FROM `''', database, '''`.''', dataset, '''.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cfp ''',
''')''');
EXECUTE IMMEDIATE source_qry;
WITH column_info AS (
SELECT table_name as name,
ARRAY_AGG(STRUCT(column_name AS name, COALESCE(description,"") AS description)) AS columns
FROM tt_master_table
GROUP by table_name
)
, table_level AS (
SELECT CONCAT(database, ".", dataset) AS name,
database,
dataset,
dataset_desc AS `description`,
ARRAY_AGG(
STRUCT(name, columns)) AS tables
FROM column_info
GROUP BY database,
dataset,
dataset_desc
LIMIT 1)
SELECT CAST(2 AS INT) AS version,
ARRAY_AGG(STRUCT(name, database, dataset, description, tables)) AS sources
FROM table_level
GROUP BY version