how to rewrite query to put data-modifying CTE at top level

Question

I'm trying to write a query for Postgres to insert a row into one table which as a side-effect may insert a row into a second table (due to the first table having a foreign key constraint on the second).

I suppose I could do this as two separate queries, possibly inside a transaction, but now that I've started down the path of a single query, I'd like to know how it might be done, if it's possible at all.

I've been trying to use the general structure laid out in How to use RETURNING with ON CONFLICT in PostgreSQL? and Insert if names do not exist and return ids if exist, which seemed like a reasonable starting point, but I'm getting the error WITH clause containing a data-modifying statement must be at the top level, and I'm not sure how to follow that advice.

Table a is something like this:

CREATE TABLE a (
    a_uuid uuid PRIMARY KEY,
    b_uuid uuid REFERENCES b(b_uuid) 
)

and table b is:

CREATE TABLE b (
    b_uuid uuid PRIMARY KEY,
    b_name text UNIQUE
)

I want to insert a row into a where b_uuid is computed based on input that'll match b_name. If b already has the row, then use the corresponding b_uuid; otherwise, create a new row in b with that text and a newly-generated UUID, returning the latter to use as part of the insert into a. This was what I had written before I realized I was in over my head:

WITH new_b AS (
        WITH input (b_uuid, b_name) AS (
                VALUES (
                        gen_random_uuid(), $1
                )
        ), ins AS (
                INSERT INTO b (
                        b_uuid, b_name
                )
                TABLE input
                ON CONFLICT DO NOTHING
                RETURNING b_uuid
        )
        TABLE ins
        UNION ALL
        SELECT b_uuid FROM input
)
INSERT INTO a (a_uuid, b_uuid)
VALUES (
        gen_random_uuid(), (SELECT b_uuid FROM new_b)
)

Am I anywhere close? What would be the best way of doing this?

S-Man · Accepted Answer · 2019-06-05T07:44:31.047

1

demo:db<>fiddle (because of random things, you may reload several times if the random uuid equals 1; instead of type uuid I used int because the fiddle engine currently does not support the pgcrypto extension. I simulated the function with an own one.)

WITH input (b_uuid, b_name) AS (
    VALUES (
        gen_random_uuid(), $1
    )
), ins_b AS (
    INSERT INTO b (
        b_uuid, b_name
    )
    TABLE input
    ON CONFLICT DO NOTHING
    RETURNING b_uuid
), new_b AS (
    TABLE ins_b
    UNION ALL
    SELECT b.b_uuid FROM input
    JOIN b USING (b_uuid)
)
INSERT INTO a (a_uuid, b_uuid)
VALUES (
    gen_random_uuid(), (SELECT b_uuid FROM new_b)
);

Your solution is not far away:

Manipulating statements (like INSERT) cannot be inside nested WITH clauses (At this point: Thank you, I didn't even know about this nested CTE feature :D)
The main point of @ErwinBrandstetter's incredible fantastic solution (How to use RETURNING with ON CONFLICT in PostgreSQL?) is the JOIN that your solution is missing: The new_b part works as follows: If there is a conflict, ins_b returns nothing. So, TABLE ins_b is empty. In that case the already existing b_uuid needs to be called directly from TABLE b. Taking the generated UUID, joining it against b gives out the existing b_uuid (and if you wish, every other column of this record). But if there is no conflict - the b_uuid does not exist yet -, then ins_b returns the new data set, TABLE ins_b is not empty, but the join on the original table fails because there is still no record persisted which can be used to join. This, of course, works for more than one record to be inserted.

edited Jun 05 '19 at 07:44

answered Jun 05 '19 at 07:27

S-Man

22,521
7
40
63

Thank you; that was a re-orientation I couldn't get my brain to do. But it doesn't work for me. I took your fiddle and made [another one](https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=0e3e147958be230e29634d910727cb31) which adds a second `ABC` row, and that row got its own "UUID", rather than sharing the ID that was generated the first time. (And in my actual code, I'm getting NULLs for the extra rows, so something is getting lost in translation.) I'll keep fiddling. – Danek Duvall Jun 05 '19 at 20:56
The translation error seems to be that `b_name` needs to be `UNIQUE`. Correcting that in the [fiddle](https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=c983f8cc6cb3475a4c37e09cce74ad85) gives me one of two behaviors. The first is the one I'm seeing, where one of the two rows in `a` references a row in `b`, and the other has a NULL there, matches what I see in my code. The second has one row referencing `ABC` as expected, and one row referencing `XYZ`, which is not. I assume this is because it's randomly generating `b_uuid` as 1. – Danek Duvall Jun 05 '19 at 22:09
So I'm getting what I need from [this](https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=f1c4447377e00a2162bec3bc62b7606d). The only change is in `new_b`. Maybe I don't entirely get the point of the `JOIN`, but I think it's how I was getting the errant `NULL`s. Since I only want the newly generated UUID if it got inserted, then I can ignore `input` in favor of `ins_b`, and if it wasn't inserted, that's because it was already in `b`, so I pull it out of there directly. – Danek Duvall Jun 06 '19 at 18:39
@DanekDuvall Sry for delay. Currently I am at holidays. I am not quite sure: Did you solve your problem or not. Your comments are not clear to me. Maybe you should create a fiddle which shows your problem more clearly? – S-Man Jun 10 '19 at 12:05
Yes, thank you; I've gotten something working (link in third comment), though not by using the `JOIN` construction. The link in my first comment shows "duplicate" rows due to my not having expressed the constraint correctly in the first place (my apologies). The link in my second comment fixes that and shows `NULL` entries where I shouldn't have any, and then once I ditch the `JOIN`, and just `SELECT` from the original table, it all works. Whether there's something better, I don't know; I didn't try too hard to figure out how to make `JOIN` work. Enjoy your holidays! – Danek Duvall Jun 10 '19 at 16:50
1. The JOIN generally does exacte the same as your WHERE clause: If there is a record with the given name it shows it. Otherwise it gives out nothing. In your case you could use the WHERE clause as well but then you are limited to a static number of inserts. The JOIN is more general in case you want to insert more than just one records https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=be5c61827902164f40ddae7babd12b6b – S-Man Jun 11 '19 at 08:05
2. The NULL values come from the UNIQUE clause. Because you are trying to insert ABC two times the second one fails (but does NOTHING). As described above this would result in an empty `TABLE ins_b`. But since there is a new b_uuid nevertheless (it fails on UNIQUE name not on the primary key!) the second part of the UNION founds nothing as well. So it returns an empty b_uuid (no new has been inserted, but the new one is still not existing in the actual table). This generates the NULL value. – S-Man Jun 11 '19 at 08:10
I see. And changing the thing I insert into `a` from `VALUES (gen_random_uuid(), (SELECT b_uuid FROM new_b))` to `(SELECT gen_random_uuid(), b_uuid FROM new_b)` (as you did in your latest fiddle) means it's inserting no rows, rather than a new row with a new `a_uuid` and a `NULL` `b_uuid`. – Danek Duvall Jun 12 '19 at 01:19
And the last bit is that I need to `JOIN` not on `b_uuid`, but on `b_name`. Otherwise, I don't get a second row in `a` (which should have the same `b_uuid` as in the first row). I'd updated my actual code to do the equivalent, and been unable to see the difference in the code in the fiddle, but now I think I have it all pieced together ([updated fiddle](https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=63ffceefcb93b763946b97307bbaa417)). Thank you! – Danek Duvall Jun 12 '19 at 01:25

how to rewrite query to put data-modifying CTE at top level

1 Answers1