5

I have a relation

+-----+----+
| seq | id |
+-----+----+
|   1 | A1 |
|   2 | B1 |
|   3 | C1 |
|   4 | D1 |
+-----+----+

and want to join it in PostgreSQL with

+----+-------+
| id | alter |
+----+-------+
| B1 | B2    |
| D1 | D2    |
+----+-------+

so I get all possible combinations of replacement (i.e. the Cartesian product of replacing more or less). So group 1 has no update,group 2 only B2, group 3 only D2 and group 4 both B2 and D2.

The end should look like this, but should be open to more (like an extra D3 for D1)

+-------+-----+----+
| group | seq | id |
+-------+-----+----+
|     1 |   1 | A1 |
|     1 |   2 | B1 |
|     1 |   3 | C1 |
|     1 |   4 | D1 |
|     2 |   1 | A1 |
|     2 |   2 | B2 |
|     2 |   3 | C1 |
|     2 |   4 | D1 |
|     3 |   1 | A1 |
|     3 |   2 | B1 |
|     3 |   3 | C1 |
|     3 |   4 | D2 |
|     4 |   1 | A1 |
|     4 |   2 | B2 |
|     4 |   3 | C1 |
|     4 |   4 | D2 |
+-------+-----+----+

EDIT:

Another possible replacement table could be

+----+-------+
| id | alter |
+----+-------+
| B1 | B2    |
| D1 | D2    |
| D1 | D3    |
+----+-------+

could should result in 6 groups (I hope I haven't forgot a case)

+-------+-----+----+
| group | seq | id |
+-------+-----+----+
|     1 |   1 | A1 |
|     1 |   2 | B1 |
|     1 |   3 | C1 |
|     1 |   4 | D1 |
|     2 |   1 | A1 |
|     2 |   2 | B2 |
|     2 |   3 | C1 |
|     2 |   4 | D1 |
|     3 |   1 | A1 |
|     3 |   2 | B2 |
|     3 |   3 | C1 |
|     3 |   4 | D2 |
|     4 |   1 | A1 |
|     4 |   2 | B2 |
|     4 |   3 | C1 |
|     4 |   4 | D3 |
|     5 |   1 | A1 |
|     5 |   2 | B1 |
|     5 |   3 | C1 |
|     5 |   4 | D2 |
|     6 |   1 | A1 |
|     6 |   2 | B1 |
|     6 |   3 | C1 |
|     6 |   4 | D3 |
+-------+-----+----+

If you have instead three replacements like

+----+-------+
| id | alter |
+----+-------+
| B1 | B2    |
| C1 | C2    |
| D1 | D3    |
+----+-------+

It'll result in 8 groups. What I tried so far was not really helpful:

WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1')   ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
---------
SELECT row_number() OVER (PARTITION BY a.id) as g, * FROM 
a
CROSS JOIN  b as b1
CROSS JOIN  b as b2
LEFT JOIN b as b3 ON a.id=b3.id
ORDER by g,seq;
starball
  • 20,030
  • 7
  • 43
  • 238
sequoia
  • 480
  • 2
  • 13
  • Is this specific to those two substitutions? Or could the second table have more than 2? If so, what if the same value is substituted more than once? – Gordon Linoff Apr 08 '20 at 17:13
  • Hi Gordon, can be more as two of course (what I meant with "like an extra D3 for D1"): So for example D1 would also be replaced by extra D3. This would result in 46 rows with 16 groups I guess. And so on... – sequoia Apr 08 '20 at 17:16
  • 1
    `(what I meant with "like an extra D3 for D1"): So for example D1 would also be replaced by extra D3. This would result in 46 rows with 16 groups I guess` "would also be replaced" leaves room for interpretation; and the numbers "46 rows with 16 groups" don't light my fire, either. Can you explain some more? (Please clarify in the question!) – Erwin Brandstetter May 04 '20 at 23:08
  • You want a dynamic number of `CROSS JOIN`s... interesting. +1 – The Impaler May 07 '20 at 23:46
  • 1
    The number of combinations grows exponentially. You can't have more than 20 or 30 rows in table `b`. – The Impaler May 07 '20 at 23:59
  • @ErwinBrandstetter sorry, combinatorics isn't my strength. I added more possible cases for the replacement table to the question – sequoia May 09 '20 at 12:16
  • In your second example, groups 3 and 5 end up being identical. Is that intended? Can you explain the intention behind the operation in plain English, too? – Erwin Brandstetter May 09 '20 at 12:46
  • @ErwinBrandstetter No, sorry again. I corrected that. I hope now the intention is clear. – sequoia May 09 '20 at 12:50
  • Can there be more than 1 duplicate `id` in the replacement table? (How to combine those?) Can there be more than 2 duplicates per `id`? (How to treat those?) What is the range of possible sets in the replacement table? Can there be `id` values in the replacement table that are not found in the target table? (How to deal with that?) – Erwin Brandstetter May 09 '20 at 12:57
  • Also: are values in `alter` unique in the target table? I.e., the only 'D2' in the result would come from our `UPDATE`? – Erwin Brandstetter May 09 '20 at 13:04
  • @ErwinBrandstetter There can be even more than 2 duplicates per 'id'. Maybe imagine it like you are driving on a highway and at each tollbooth, you don't know which cashier you will get. Your standard cashier would be the one on the left lane (with the number 1). But sometimes you have to use another cashier at a tollbooth. These different cashiers are the given in the replacement table. So, the result table is a combination of all possible cashier combinations from all tollbooths. – sequoia May 09 '20 at 13:08
  • I *think* I get an idea what you are trying to ask. For a given set toll stations (say N), each with a given number of booths (say M), you want to get all possible routes. That's `2^N` possible routes (***combinations*** of toll stations), and for each possible route you have `M1*M2* .. Mn` ***variations*** from the different toll boths. The number of result rows is growing like an atomic bomb: `2^M*M1*M2* .. *Mn`. Or rather: You want all possible variations for a single *given* route. Still *huge* – Erwin Brandstetter May 09 '20 at 13:33
  • I think my first solution covers your case of the powerset of combinations with replacement, but the latter (compressed) one does not. Will update answer when I'm able to get back on the pc – Haleemur Ali May 09 '20 at 13:57
  • I added a solution for what I think you really want. – Erwin Brandstetter May 09 '20 at 14:16
  • Please see: ["how to format a table in a post"](https://meta.stackoverflow.com/q/277716/11107541). – starball Nov 06 '22 at 22:04

4 Answers4

2

So group 1 has no update,group 2 only B2, group 3 only D2 and group 4 both B2 and D2.

Since the logic of this statement is not in a table, I decided to add this logic to table c, which is adding 3 new columns to the existing table a, depending on which selection of field had to be considered.

WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1')   ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
, c as (
SELECT a.seq, a.id,
COALESCE(b1.alter,a.id) as id2,
COALESCE(b2.alter,a.id) as id3,
COALESCE(b3.alter,a.id) as id4
FROM a
LEFT JOIN (SELECT * FROM b WHERE b.alter='B2') b1 ON a.id = b1.id
LEFT JOIN (SELECT * FROM b WHERE b.alter='D2') b2 ON a.id = b2.id
LEFT JOIN (SELECT * FROM b WHERE b.alter IN ('B2','D2')) b3 ON a.id = b3.id)
, d as (SELECT * FROM (values (1),(2), (3), (4)   ) as d1(gr) )



SELECT d.gr,
CASE d.gr
   WHEN 1 THEN c.id
   WHEN 2 THEN c.id2
   WHEN 3 THEN c.id3
   WHEN 4 THEN c.id4 END as id

FROM d
CROSS JOIN  c
ORDER by d.gr, c.seq
CarlosSR
  • 1,145
  • 1
  • 10
  • 22
2

What you need

After additional info from your comments, it seems like this is your case:

You have toll stations with a given number of booths:

CREATE TABLE station (
  station text PRIMARY KEY
, booths  int NOT NULL  -- number of cashiers in station
);
INSERT INTO station VALUES 
  ('A', 1)
, ('B', 2)
, ('C', 1)
, ('D', 3);

For a given route, say A --> B --> C --> D you want to generate all possible paths, taking booth numbers into account. I suggest an SQL function with a recursive CTE like:

CREATE OR REPLACE FUNCTION f_pathfinder(_route text[])
  RETURNS TABLE (grp int, path text[]) LANGUAGE sql STABLE PARALLEL SAFE AS
$func$
WITH RECURSIVE rcte AS (
   SELECT cardinality($1) AS hops, 1 AS hop, ARRAY[s.station || booth] AS path
   FROM   station s, generate_series(1, s.booths) booth
   WHERE  s.station = $1[1]

   UNION ALL
   SELECT r.hops, r.hop + 1, r.path || (s.station || booth)
   FROM   rcte  r
   JOIN   station s ON s.station = _route[r.hop + 1], generate_series(1, s.booths) booth
   WHERE  r.hop < r.hops
   )
SELECT row_number() OVER ()::int AS grp, path
FROM   rcte r
WHERE  r.hop = r.hops;
$func$;

Simple call:

SELECT * FROM f_pathfinder('{A,B,C,D}'::text[]);

Result:

 grp | path
---: | :--------
   1 | {1,1,1,1}
   2 | {1,1,1,2}
   3 | {1,1,1,3}
   4 | {1,2,1,1}
   5 | {1,2,1,2}
   6 | {1,2,1,3}

Or with unnested arrays (result like you show in question):

SELECT grp, seq, booth
FROM   f_pathfinder('{A,B,C,D}'::text[])
     , unnest(path) WITH ORDINALITY AS x(booth, seq);  -- ①

Result:

grp | seq | booth
--: | --: | :----
  1 |   1 | A1   
  1 |   2 | B1   
  1 |   3 | C1   
  1 |   4 | D1   
  2 |   1 | A1   
  2 |   2 | B1   
  2 |   3 | C1   
  2 |   4 | D2   
  3 |   1 | A1   
  3 |   2 | B1   
  3 |   3 | C1   
  3 |   4 | D3   
  4 |   1 | A1   
  4 |   2 | B2   
  4 |   3 | C1   
  4 |   4 | D1   
  5 |   1 | A1   
  5 |   2 | B2   
  5 |   3 | C1   
  5 |   4 | D2   
  6 |   1 | A1   
  6 |   2 | B2   
  6 |   3 | C1   
  6 |   4 | D3   

db<>fiddle here

The number of variants is growing quickly with the number of stops in your route. It's M1*M2* .. *Mn with Mn being the number of booths for the nth station.

① About ORDINALITY:

What you asked (originally)

Seems like you want to apply all possible combinations from the set of changes listed in the replacement table rpl to the target table tbl.

With just two rows, forming the 4 (2^n) possible combinations is simple. For a general solution, I suggest a basic combinatorics function to generate all combinations. There are innumerable ways. Here is a pure SQL function:

CREATE OR REPLACE FUNCTION f_allcombos(_len int)
  RETURNS SETOF bool[] LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
WITH RECURSIVE
   tf(b) AS (VALUES (false), (true))

 , rcte AS (
   SELECT 1 AS lvl, ARRAY[b] AS arr
   FROM   tf

   UNION ALL
   SELECT r.lvl + 1, r.arr || tf.b
   FROM   rcte r, tf 
   WHERE  lvl < _len
   )
SELECT arr
FROM   rcte
WHERE  lvl = _len;
$func$;

Similar to what's discussed here:

Example for just 2 replacement rows:

SELECT * FROM f_allcombos(2);
{f,f}
{t,f}
{f,t}
{t,t}

Query

WITH effective_rpl AS (  -- ①
   SELECT *, count(alter) OVER (ORDER BY seq) AS idx  -- ②
   FROM   tbl LEFT JOIN rpl USING (id)
   )
SELECT c.grp, e.seq
     , CASE WHEN alter IS NOT NULL AND c.arr[e.idx] THEN e.alter  -- ③
            ELSE e.id END AS id
FROM   effective_rpl e
     , f_allcombos((SELECT count(alter)::int FROM effective_rpl))  -- ④
          WITH ORDINALITY AS c(arr, grp); -- ⑤

Produces your desired result exactly.

db<>fiddle here

① Some of the replacements may have no match in the target table; so determine effective replacements to begin with.

count() only counts non-null values, so this can serve as index for the 1-based array returned from f_allcombos().

③ Only replace when a replacement is available, and the boolean array has true for the given index idx.

④ The CROSS JOIN multiplies the set of rows in the target table with the number of possible replacement combinations

⑤ I use WITH ORDINALITY to generate "group numbers". See:

We might wire that into the function directly, but I'd rather keep it generic.

Aside: "alter" is non-reserved in Postgres but a reserved word in standard SQL.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hi Erwin, thanks for your answer. It seems to work for the table I provided. If I add another third possible replacement for D1 (say D3), it produces not the expected outcome. I should have pointed that case out more clearly. – sequoia May 09 '20 at 12:09
  • @sequoia: You seem to have your answer, but I had some more ideas to improved the solution. You might be interested. – Erwin Brandstetter May 11 '20 at 00:45
1

I can only think of a brute force approach. Enumerate the groups and multiply the second table -- so one set of rows for each group.

The following then uses bit manipulation to choose which value:

WITH a as (
      SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1')   ) as a1(seq, id)
      ),
     b as (
      SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter)
     ),
     bgroups as (
      SELECT b.*, grp - 1 as grp, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY id) - 1 as seqnum
      FROM b CROSS JOIN
           GENERATE_SERIES(1, (SELECT POWER(2, COUNT(*))::int FROM b)) gs(grp)
     )
SELECT bg.grp, a.seq, 
       COALESCE(MAX(CASE WHEN a.id = bg.id AND (POWER(2, bg.seqnum)::int & bg.grp) > 0 THEN bg.alter END),
                MAX(a.id)
               ) as id
FROM a CROSS JOIN
     bgroups bg
GROUP BY bg.grp, a.seq
ORDER BY bg.grp, a.seq;

Here is a db<>fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I'm having a bit of trouble understanding how you've used bit manipulation here, but I'm very keen to learn more. The only approach I could think of is to recursively build the powersets. could you please elaborate on how the bit manipulation works is this solution – Haleemur Ali May 08 '20 at 14:05
  • @HaleemurAli . . . You should ask a question to clarify what you really want to better understand. – Gordon Linoff May 08 '20 at 16:07
1

answer updated after edit to question

The tricky part in this problem is to generate the powerset of the replacements. However, luckily postgres supports recursive queries & powersets can be computed recursively. Thus we can build a general solution to this problem that will work regardless of the size of your replacement set.

Lets call the first table source, the 2nd table replacements, and I'll avoid the unsavory name alter for something else:

CREATE TABLE source (seq, id) as (
  VALUES (1, 'A1'), (2, 'B1'), (3, 'C1'), (4, 'D1')
);
CREATE TABLE replacements (id, sub) as (
  VALUES ('B1', 'B2'), ('D1', 'D2')
);

First powerset of the ids to be replaced needs to be generated. The null-set may be omitted since that won't work with joins anyhow & at the end the source table can be union'ed to the intermediate result to provide the same output.

In the recursive step, the the JOIN condition rec.id > repl.id ensures that each id is present only once for each generated subset.

In the final step:

the cross join fans out the source N times, where N is the number of non-empty combinations of replacements (with variations)

the group names are generated using a filtered runnign sum on seq.

the subsets are unnested & the ids replaced using a coalesce if the replacement id equals the source id.

WITH RECURSIVE rec AS (
  SELECT ARRAY[(id, sub)] subset, id FROM replacements
  UNION ALL
  SELECT subset || (repl.id, sub), repl.id 
  FROM replacements repl 
  JOIN rec ON rec.id > repl.id
)
SELECT NULL subset, 0 set_name, seq, id FROM source
UNION ALL
SELECT subset
, SUM(seq) FILTER (WHERE seq = 1) OVER (ORDER BY subset, seq) set_name 
, seq
, COALESCE(sub, source.id) id
FROM rec 
CROSS JOIN source
LEFT JOIN LATERAL (
  SELECT id, sub 
  FROM unnest(subset) x(id TEXT, sub TEXT)
  ) x ON source.id = x.id;

Tests

With replacement values ('B1', 'B2'), ('D1', 'D2'), the query returns 4 groups.

        subset         | set_name | seq | id 
-----------------------+----------+-----+----
                       |        0 |   1 | A1
                       |        0 |   2 | B1
                       |        0 |   3 | C1
                       |        0 |   4 | D1
 {"(B1,B2)"}           |        1 |   1 | A1
 {"(B1,B2)"}           |        1 |   2 | B2
 {"(B1,B2)"}           |        1 |   3 | C1
 {"(B1,B2)"}           |        1 |   4 | D1
 {"(D1,D2)"}           |        2 |   1 | A1
 {"(D1,D2)"}           |        2 |   2 | B1
 {"(D1,D2)"}           |        2 |   3 | C1
 {"(D1,D2)"}           |        2 |   4 | D2
 {"(D1,D2)","(B1,B2)"} |        3 |   1 | A1
 {"(D1,D2)","(B1,B2)"} |        3 |   2 | B2
 {"(D1,D2)","(B1,B2)"} |        3 |   3 | C1
 {"(D1,D2)","(B1,B2)"} |        3 |   4 | D2
(16 rows)

With replacement values ('B1', 'B2'), ('D1', 'D2'), ('D1', 'D3'), The query returns 6 groups:

        subset         | set_name | seq | id 
-----------------------+----------+-----+----
                       |        0 |   1 | A1
                       |        0 |   2 | B1
                       |        0 |   3 | C1
                       |        0 |   4 | D1
 {"(B1,B2)"}           |        1 |   1 | A1
 {"(B1,B2)"}           |        1 |   2 | B2
 {"(B1,B2)"}           |        1 |   3 | C1
 {"(B1,B2)"}           |        1 |   4 | D1
 {"(D1,D2)"}           |        2 |   1 | A1
 {"(D1,D2)"}           |        2 |   2 | B1
 {"(D1,D2)"}           |        2 |   3 | C1
 {"(D1,D2)"}           |        2 |   4 | D2
 {"(D1,D2)","(B1,B2)"} |        3 |   1 | A1
 {"(D1,D2)","(B1,B2)"} |        3 |   2 | B2
 {"(D1,D2)","(B1,B2)"} |        3 |   3 | C1
 {"(D1,D2)","(B1,B2)"} |        3 |   4 | D2
 {"(D1,D3)"}           |        4 |   1 | A1
 {"(D1,D3)"}           |        4 |   2 | B1
 {"(D1,D3)"}           |        4 |   3 | C1
 {"(D1,D3)"}           |        4 |   4 | D3
 {"(D1,D3)","(B1,B2)"} |        5 |   1 | A1
 {"(D1,D3)","(B1,B2)"} |        5 |   2 | B2
 {"(D1,D3)","(B1,B2)"} |        5 |   3 | C1
 {"(D1,D3)","(B1,B2)"} |        5 |   4 | D3
(24 rows)

With replacement values ('B1', 'B2'), ('C1', 'C2'), ('D1', 'D2'), The query returns 8 groups:

             subset              | set_name | seq | id 
---------------------------------+----------+-----+----
                                 |        0 |   1 | A1
                                 |        0 |   2 | B1
                                 |        0 |   3 | C1
                                 |        0 |   4 | D1
 {"(B1,B2)"}                     |        1 |   1 | A1
 {"(B1,B2)"}                     |        1 |   2 | B2
 {"(B1,B2)"}                     |        1 |   3 | C1
 {"(B1,B2)"}                     |        1 |   4 | D1
 {"(C1,C2)"}                     |        2 |   1 | A1
 {"(C1,C2)"}                     |        2 |   2 | B1
 {"(C1,C2)"}                     |        2 |   3 | C2
 {"(C1,C2)"}                     |        2 |   4 | D1
 {"(C1,C2)","(B1,B2)"}           |        3 |   1 | A1
 {"(C1,C2)","(B1,B2)"}           |        3 |   2 | B2
 {"(C1,C2)","(B1,B2)"}           |        3 |   3 | C2
 {"(C1,C2)","(B1,B2)"}           |        3 |   4 | D1
 {"(D1,D2)"}                     |        4 |   1 | A1
 {"(D1,D2)"}                     |        4 |   2 | B1
 {"(D1,D2)"}                     |        4 |   3 | C1
 {"(D1,D2)"}                     |        4 |   4 | D2
 {"(D1,D2)","(B1,B2)"}           |        5 |   1 | A1
 {"(D1,D2)","(B1,B2)"}           |        5 |   2 | B2
 {"(D1,D2)","(B1,B2)"}           |        5 |   3 | C1
 {"(D1,D2)","(B1,B2)"}           |        5 |   4 | D2
 {"(D1,D2)","(C1,C2)"}           |        6 |   1 | A1
 {"(D1,D2)","(C1,C2)"}           |        6 |   2 | B1
 {"(D1,D2)","(C1,C2)"}           |        6 |   3 | C2
 {"(D1,D2)","(C1,C2)"}           |        6 |   4 | D2
 {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   1 | A1
 {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   2 | B2
 {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   3 | C2
 {"(D1,D2)","(C1,C2)","(B1,B2)"} |        7 |   4 | D2
(32 rows)
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85