1

I'm trying to expand on my understanding of CTE's and window partitions as used by PostgreSQL 9.1. (Example taken from my other question How to remove duplicate rows.) The situation is that cdesc is a text field that (from a previous database) had unnecessary characters added to keep each cdesc text entry different (and unique).

The problem is how to combine all entries in table CPT that match the string pattern of "excision/biopsy pathology".

Doing this fails since the "plan" is grouping the complete (and unique) cdesc - not just the matching part:

WITH plan AS (
SELECT recid, cdesc, min(recid) OVER (PARTITION BY cdesc) AS master_recid
FROM cpt
 WHERE cpt.cdesc LIKE '%excision/biopsy pathology%'
   )
, upd_lab AS (
   UPDATE lab l
   SET    cpt_recid = p.master_recid   
   FROM   plan p
   WHERE  l.cpt_recid = p.recid
   AND    p.recid <> p.master_recid  
   )
DELETE FROM cpt c
USING  plan p
WHERE  c.cdesc LIKE '%excision/biopsy pathology%'
AND    c.recid = p.recid
AND    p.recid <> p.master_recid  
RETURNING c.recid;

The lab table is defined (from the previous question) as:

CREATE TABLE lab (
 recid serial NOT NULL,
 cpt_recid integer,
  ........
 CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
   REFERENCES cpt (recid) MATCH SIMPLE
   ON UPDATE NO ACTION ON DELETE RESTRICT,
 ...
);

My questions are:

  1. How can string pattern matching be used with the partition, and
  2. How does the upd_lab update get called to move records in the lab table to the new master_recid?

Edit #1:
The following DOES work to do what I need, but I'm still unclear as to how the upd_lab cte is being called:

WITH plan AS (
   SELECT recid, min(recid) OVER (PARTITION BY cdesc LIKE '%excision/biopsy pathology%') AS master_recid
   FROM cpt
   WHERE cpt.cdesc LIKE '%excision/biopsy pathology%'
   )
, upd_lab AS (
   UPDATE lab l
   SET    cpt_recid = p.master_recid   -- link to master recid ...
   FROM   plan p
   WHERE  l.cpt_recid = p.recid
   AND    p.recid <> p.master_recid  -- ... only if not linked to master
   )
DELETE FROM cpt c
USING  plan p
WHERE  c.cdesc LIKE '%excision/biopsy pathology%'
AND    c.recid = p.recid
AND    p.recid <> p.master_recid  -- ... only if notmaster
RETURNING c.recid;   
Community
  • 1
  • 1
Alan Wayne
  • 5,122
  • 10
  • 52
  • 95
  • 1
    What do you mean by `but I'm still unclear as to how the upd_lab cte is being called`? You obviously called it and it worked ... Start by [reading the manual here](http://www.postgresql.org/docs/current/interactive/queries-with.html). – Erwin Brandstetter Jun 27 '15 at 22:26
  • @ErwinBrandstetter If I understand it correctly, the "With" --both the plan and the upd_lab -- will be executed automatically (without an explicit call in the parent delete) for each row the parent delete accesses. Right? :) – Alan Wayne Jun 28 '15 at 14:20
  • 1
    Almost true. However, data-modifying CTEs are treated differently from simple CTEs (with only `SELECT`). Details : http://stackoverflow.com/a/15810159/939860 – Erwin Brandstetter Jun 28 '15 at 22:05

1 Answers1

1

Just remove the partition by. The where is doing the work you want:

WITH plan AS (
      SELECT recid, cdesc, min(recid) OVER () AS master_recid
      FROM cpt
      WHERE cpt.cdesc LIKE '%excision/biopsy pathology%'
   ),
   upd_lab AS (
    UPDATE lab l
       SET    cpt_recid = p.master_recid   
       FROM   plan p
       WHERE  l.cpt_recid = p.recid AND
              p.recid <> p.master_recid  
   )
DELETE FROM cpt c
USING  plan p
WHERE  c.cdesc LIKE '%excision/biopsy pathology%' AND
       c.recid = p.recid AND
       p.recid <> p.master_recid  
RETURNING c.recid;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786