0

I am trying to refactor a query and remove a set of window function to instead capture all records as is. When I go to paste query in to replace my function, pgAdmin indicates a syntax error near the 'Truncate' statement. I just cant figure out what it is and is likely something ridiculously obvious but would appreciate another (or many) set of eyes.

I am on Postgres 10.12.

Here is the code I am trying to insert (have included the 'before' code after it in the end. Any advice would be much appreciated. Have tried all types and combos of bracket, comma and semi-colon closure and at wits end ...

Code I am trying to insert

Intended Outcome: Take rows from staging and insert into target. Truncate staging when done

  WITH ins_agent AS (
     INSERT INTO agents (
        currentagentsnapshot,
        previousagentsnapshot,
        agentarn,
        eventid,
        eventtimestamp,
        eventtype,
        instancearn)
     SELECT 
        agent_stg.currentagentsnapshot::jsonb AS currentagentsnapshot,
        agent_stg.previousagentsnapshot::jsonb AS previousagentsnapshot,
        agent_stg.agentarn::text AS agentarn,
        agent_stg.eventid::text AS eventid,
        agent_stg.eventtimestamp::timestamptz AS eventtimestamp,
        agent_stg.eventtype::text AS eventtype,
        agent_stg.instancearn::text AS instancearn
    FROM agents_staging AS agent_stg
    WHERE agent_stg.agentarn IS NOT NULL)
    TRUNCATE agents_staging;

Code I am trying to change

Intended Outcome: Select the distinct agent and capture their most recent snapshots. Insert the agent_id into another table. Account for upserts.

Reason for change: Aside from being a bear of a query to run on millions of records, it provides little value as it discards all agent snapshots except their most current because of the distinct. Not what we were going for.

WITH ins_agent AS (INSERT INTO agents (currentagentsnapshot,
                                       previousagentsnapshot,
                                       agentarn,
                                       eventid,
                                       eventtimestamp,
                                       eventtype,
                                       instancearn)
    SELECT DISTINCT LAST_VALUE(currentagentsnapshot::jsonb)
                    OVER (PARTITION BY agentarn ORDER BY lat.eventtimestamp DESC ) AS previousagentsnapshot,
                    LAST_VALUE(previousagentsnapshot::jsonb)
                    OVER (PARTITION BY agentarn ORDER BY lat.eventtimestamp DESC ) AS previousagentsnapshot,
                    agentarn,
                    LAST_VALUE(eventid)
                    OVER (PARTITION BY agentarn ORDER BY lat.eventtimestamp DESC)  AS eventid,
                    LAST_VALUE(lat.eventtimestamp)
                    OVER (PARTITION BY agentarn ORDER BY lat.eventtimestamp DESC)  AS eventtimestamp,
                    LAST_VALUE(eventtype)
                    OVER (PARTITION BY agentarn ORDER BY lat.eventtimestamp DESC)  AS eventtype,
                    LAST_VALUE(instancearn)
                    OVER (PARTITION BY agentarn ORDER BY lat.eventtimestamp DESC)  AS instancearn
    FROM agents_staging AS agent
             CROSS JOIN LATERAL (SELECT agent.eventtimestamp::timestamp AS eventtimestamp) AS lat
    WHERE NOT (
            agent.agentarn IS NULL
            OR agent.currentagentsnapshot IS NULL
            OR agent.currentagentsnapshot = '{}')
    ON CONFLICT (agentarn)
        DO NOTHING
    RETURNING agentarn, agent_id)
UPDATE contact_trace_records
SET agent_id=ins_agent.agent_id
FROM ins_agent
WHERE contact_trace_records.agent_id IS NULL
  AND ins_agent.agentarn = contact_trace_records.agent ->> 'ARN';

TRUNCATE agents_staging;

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
sandeepmohan
  • 79
  • 12
  • Not sure if Aurora syntax is different than vanilla PostgreSql, but is it as simple as you have a parens instead of a semicolon after `WHERE agent_stg.agentarn IS NOT NULL)` before the `TRUNCATE`? – mdisibio May 20 '21 at 23:32
  • @mdisibio - That's the frustrating thing though. So the parentheses is there to close out the with clause at the top of the line. And then I put a semi colon after to end the with. That threw an error. Tried to wrap in truncate with a semicolon after the 'where' and that throws an error. Tried to even leave the 'with' not closed but that throws an error too. Ughhh... – sandeepmohan May 20 '21 at 23:59

1 Answers1

3

The syntax error is that CTEs cannot be attached to a TRUNCATE command. Only to INSERT, UPDATE, DELETE, or SELECT. (Well, technically also TABLE, but that's just short syntax for SELECT * FROM.)

Remove the WITH clause, and run two separate commands.

INSERT INTO agents (
    currentagentsnapshot,
    previousagentsnapshot,
    agentarn,
    eventid,
    eventtimestamp,
    eventtype,
    instancearn)
SELECT 
    agent_stg.currentagentsnapshot::jsonb AS currentagentsnapshot,
    agent_stg.previousagentsnapshot::jsonb AS previousagentsnapshot,
    agent_stg.agentarn::text AS agentarn,
    agent_stg.eventid::text AS eventid,
    agent_stg.eventtimestamp::timestamptz AS eventtimestamp,
    agent_stg.eventtype::text AS eventtype,
    agent_stg.instancearn::text AS instancearn
FROM agents_staging AS agent_stg
WHERE agent_stg.agentarn IS NOT NULL;

TRUNCATE agents_staging;

If you need to defend against race conditions, wrap the whole operation in a transaction and take a write lock on the table first:

BEGIN;
LOCK TABLE agents_staging;
INSERT ...;
TRUNCATE agents_staging;
COMMIT;

Aside: table aliases in the outer SELECT do nothing in a INSERT command. Those are documentation at best. But since all columns already have their eventual name, it's really just noise.

You probably don't need table qualification or the explicit casts either, if the source columns have compatible data types (with an implicit or assignment cast to the target type, not an explicit cast). Looks like both tables to have identical data types to begin with? Then you can largely simplify to:

INSERT INTO agents
      (currentagentsnapshot, previousagentsnapshot, agentarn, eventid, eventtimestamp, eventtype, instancearn)
SELECT currentagentsnapshot, previousagentsnapshot, agentarn, eventid, eventtimestamp, eventtype, instancearn
FROM   agents_staging
WHERE  agentarn IS NOT NULL;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for the fix and the advice. Noted on the aliasing. Instinct, I guess but good to know its not required. The explicit casts are because the staging tables get it as text as Glue cant transport jsonb over jdbc. – sandeepmohan May 21 '21 at 03:12
  • Not sure what you are referring to regarding the race conditions but didnt want to wrap it as a separate transaction as thee are 3 other tables that are part of the same function prior to the commit. Stage for agents and the others load asynchronously so had this function as part of a trigger on a logging table that would be written to as data arrived in either of the stage tables. Once again, thanks very much – sandeepmohan May 21 '21 at 03:16
  • 1
    @sandeepmohan: Example for race condition: A concurrent transaction changes any row into `agents_staging` *after* your `INSERT` started. Then `TRUNCATE` is first blocked until that transaction finishes, and then the changes are lost. But a table name `agents_staging` would indicate that nobody else writes to that table ... – Erwin Brandstetter May 21 '21 at 13:22