8

Given a structure like this:

CREATE TABLE reference_table (
  reference_table_key numeric NOT NULL,
  reference_value numeric,
  CONSTRAINT reference_table_pk PRIMARY KEY (reference_table_key)
);

CREATE TABLE other_table (
  other_table_key numeric NOT NULL,
  reference_table_key numeric,
  CONSTRAINT other_table_pk PRIMARY KEY (other_table_key),
  ONSTRAINT other_table_reference_fk FOREIGN KEY (reference_table_key)
      REFERENCES reference_table (reference_table_key) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE SET NULL
);

CREATE TABLE another_table (
  another_table_key numeric NOT NULL,
  do_stuff_key numeric,
  CONSTRAINT another_table_pk PRIMARY KEY (another_table_key),
  ONSTRAINT another_table_reference_fk FOREIGN KEY (do_stuff_key)
      REFERENCES reference_table (reference_table_key) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE SET NULL
);

--there are 50-60 tables which have similar foreign key references to reference_table

I want to write a query that tells me the primary keys in other_table and another_table and potentially more tables where reference_value is NULL.

In psuedo-code:

SELECT table_name, table_primary_key, table_fk_column_name
FROM ?????? some PG table ???????, reference_table
WHERE reference_table.reference_value IS NULL;

The result would look something like:

table_name | table_primary_key | table_fk_column_name | reference_table_pk
---------------------------------------------------------------------------
other_table   | 2                |  reference_table_key | 7
other_table   | 4                |  reference_table_key | 56
other_table   | 45               |  reference_table_key | 454
other_table   | 65765            |  reference_table_key | 987987
other_table   | 11               |  reference_table_key | 3213
another_table | 3                |  do_stuff_key        | 4645
another_table | 5                |  do_stuff_key        | 43546
another_table | 7                |  do_stuff_key        | 464356
unknown_table | 1                |  unkown_column_key   | 435435
unknown_table | 1                |  some_other_column_key | 34543
unknown_table | 3                |  unkown_column_key   | 124
unknown_table | 3                |  some_other_column_key | 123

This is similar to, but not a duplicate of Postgres: SQL to list table foreign keys . That question shows the table structure. I want to find specific instances.

Essentially if I were to DELETE FROM reference_table WHERE reference_value IS NULL;, postgres has to do something internally to figure out that it needs to set reference_table_key in row 2 in other_table to NULL. I want to see what those rows would be.

Is there a query that can do this? Is there a modifier that I can pass to a DELETE call that would tell me what tables/rows/columns would be affected by that DELETE?

Community
  • 1
  • 1
Freiheit
  • 8,408
  • 6
  • 59
  • 101

2 Answers2

7

NULL values in referencing columns

This query produces the DML statement to find all rows in all tables, where a column has a foreign-key constraint referencing another table but hold a NULL value in that column:

WITH x AS (
 SELECT c.conrelid::regclass    AS tbl
      , c.confrelid::regclass   AS ftbl
      , quote_ident(k.attname)  AS fk
      , quote_ident(pf.attname) AS pk
 FROM   pg_constraint c
 JOIN   pg_attribute  k ON (k.attrelid, k.attnum) = (c.conrelid, c.conkey[1])
 JOIN   pg_attribute  f ON (f.attrelid, f.attnum) = (c.confrelid, c.confkey[1])
 LEFT   JOIN pg_constraint p  ON p.conrelid = c.conrelid AND p.contype = 'p'
 LEFT   JOIN pg_attribute  pf ON (pf.attrelid, pf.attnum)
                               = (p.conrelid, p.conkey[1])
 WHERE  c.contype   = 'f'
 AND    c.confrelid = 'fk_tbl'::regclass  -- references to this tbl
 AND    f.attname   = 'fk_tbl_id'         -- and only to this column
)
SELECT string_agg(format(
'SELECT %L AS tbl
     , %L AS pk
     , %s::text AS pk_val
     , %L AS fk
     , %L AS ftbl
FROM   %1$s WHERE %4$s IS NULL'
                  , tbl
                  , COALESCE(pk 'NONE')
                  , COALESCE(pk 'NULL')
                  , fk
                  , ftbl), '
UNION ALL
') || ';'
FROM   x;

Produces a query like this:

SELECT 'some_tbl' AS tbl
     , 'some_tbl_id' AS pk
     , some_tbl_id::text AS pk_val
     , 'fk_tbl_id' AS fk
     , 'fk_tbl' AS ftbl
FROM   some_tbl WHERE fk_tbl_id IS NULL
UNION ALL
SELECT 'other_tbl' AS tbl
     , 'other_tbl_id' AS pk
     , other_tbl_id::text AS pk_val
     , 'some_name_id' AS fk
     , 'fk_tbl' AS ftbl
FROM   other_tbl WHERE some_name_id IS NULL;

Produces output like this:

    tbl    |     pk       | pk_val |    fk        |  ftbl
-----------+--------------+--------+--------------+--------
 some_tbl  | some_tbl_id  | 49     | fk_tbl_id    | fk_tbl
 some_tbl  | some_tbl_id  | 58     | fk_tbl_id    | fk_tbl
 other_tbl | other_tbl_id | 66     | some_name_id | fk_tbl
 other_tbl | other_tbl_id | 67     | some_name_id | fk_tbl
  • Does not cover multi-column foreign or primary keys reliably. You have to make the query more complex for this.

  • I cast all primary key values to text to cover all types.

  • Adapt or remove these lines to find foreign key pointing to an other or any column / table:

    AND    c.confrelid = 'fk_tbl'::regclass
    AND    f.attname = 'fk_tbl_id' -- and only this column
    
  • Tested with PostgreSQL 9.1.4. I use the pg_catalog tables. Realistically nothing of what I use here is going to change, but that is not guaranteed across major releases. Rewrite it with tables from information_schema if you need it to work reliably across updates. That is slower, but sure.

  • I did not sanitize table names in the generated DML script, because quote_ident() would fail with schema-qualified names. It is your responsibility to avoid harmful table names like "users; DELETE * FROM users;". With some more effort, you can retrieve schema-name and table name separately and use quote_ident().


NULL values in referenced columns

My first solution does something subtly different from what you ask, because what you describe (as I understand it) is non-existent. The value NULL is "unknown" and cannot be referenced. If you actually want to find rows with a NULL value in a column that has FK constraints pointing to it (not to the particular row with the NULL value, of course), then the query can be much simplified:

WITH x AS (
 SELECT c.confrelid::regclass   AS ftbl
       ,quote_ident(f.attname)  AS fk
       ,quote_ident(pf.attname) AS pk
       ,string_agg(c.conrelid::regclass::text, ', ') AS referencing_tbls
 FROM   pg_constraint c
 JOIN   pg_attribute  f ON (f.attrelid, f.attnum) = (c.confrelid, c.confkey[1])
 LEFT   JOIN pg_constraint p  ON p.conrelid = c.confrelid AND p.contype = 'p'
 LEFT   JOIN pg_attribute  pf ON (pf.attrelid, pf.attnum)
                               = (p.conrelid, p.conkey[1])
 WHERE  c.contype = 'f'
 -- AND    c.confrelid = 'fk_tbl'::regclass  -- only referring this tbl
 GROUP  BY 1, 2, 3
)
SELECT string_agg(format(
'SELECT %L AS ftbl
     , %L AS pk
     , %s::text AS pk_val
     , %L AS fk
     , %L AS referencing_tbls
FROM   %1$s WHERE %4$s IS NULL'
                  , ftbl
                  , COALESCE(pk, 'NONE')
                  , COALESCE(pk, 'NULL')
                  , fk
                  , referencing_tbls), '
UNION ALL
') || ';'
FROM   x;

Finds all such rows in the entire database (commented out the restriction to one table). Tested with Postgres 9.1.4 and works for me.

I group multiple tables referencing the same foreign column into one query and add a list of referencing tables to give an overview.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This is very very close. `SELECT 'some_tbl' AS tbl ,'some_tbl_id' AS pk ,some_tbl_id::text AS pk_val ,'some_col' AS fk ,'fk_tbl' AS ftbl FROM some_tbl WHERE some_col IS NULL` I think that needs to introduce a JOIN. I want to find columns in `some_table` which are a foreign key to `reference_table` where `reference_table.reference_value` is NULL. I think Erwin's query finds places where `some_table.some_col` is NULL and is a foreign key to `reference_table`. – Freiheit Aug 03 '12 at 14:39
  • Got it! `,''' || ftbl || ''' AS ftbl FROM ' || tbl || ' INNER JOIN fk_tbl AS fkt ON ce.coded_element_key = ' || tbl || '.' || fk_col || ' WHERE fkt.reference_value IS NULL', '` – Freiheit Aug 03 '12 at 14:47
  • @Freiheit: My solution does something subtly different from what you ask, because what you describe (as I understand it) is non-existent. The value `NULL` is unknown and cannot be referenced. If you just want to find rows with a NULL value in a column that has fk-constraints pointing to it, then the query can be much simplified. – Erwin Brandstetter Aug 03 '12 at 14:47
  • The root of my problem is that I have a table `reference_table`, which should have a value in `reference_value` but does not. This has many foreign key references from other tables in our schema. My team suspects that there is a bug in some code which is not assigning a value to 'reference_value' when storing data on `some_table`. To find out what code might not be properly setting `reference_value` we have to know the tables/rows from `some_table` which link to it. This is trivial to solve if we only have a few tables with foreign keys to `reference_table`.Your query solves it for many tables – Freiheit Aug 03 '12 at 14:59
  • @Freiheit: Cool that it helps! I added the (simpler) variant to find `NULL` values in referenced columns - for the general public. – Erwin Brandstetter Aug 03 '12 at 15:06
  • New solution does not work. It actually generates the same query over and over! – Freiheit Aug 03 '12 at 15:48
  • @Freiheit: I added grouping and some clarification and tested the second query. Provided for corner case where pkey is missing in both queries. Should work better now. In principal the second was working already, just very redundant with many referencing tables. – Erwin Brandstetter Aug 03 '12 at 18:19
0

You want a union for this query:

select *
from ((select 'other_table' as table_name,
               other_table_key as primary_key,
               'reference_table_key' as table_fk,
               ot.reference_table_key
       from other_table ot left outer join
            reference_table rt
            on ot.reference_table_key = rt.reference_table_key
       where rt.reference_value is null
      ) union all
      (select 'another_table' as table_name,
               another_table_key as primary_key,
               'do_stuff_key' as table_fk,
               at.do_stuff_key
       from another_table at left outer join
            reference_table rt
            on at.do_stuff_key = rt.reference_table_key
       where rt.reference_value is null
      )
     ) t
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This solution is not ideal. I've listed two example tables, but there are approximately 50 tables in total that may contain a foreign key to reference_table. I'm trying to avoid doing an operation over and over by hand for each table. – Freiheit Aug 02 '12 at 19:36
  • You may be able to use the tables in information_schema to create the query that you want. At a minimum, you could list the tables and keys in two columns in Excel and generate each subquery using Excel functions. – Gordon Linoff Aug 02 '12 at 20:15
  • Thanks for the clue. I'll read up on information_schema and see what I can fish out. – Freiheit Aug 02 '12 at 20:22