12

Here is the code:

CREATE TABLE audit_trail (
      old_email TEXT NOT NULL,
      new_email TEXT NOT NULL
);

INSERT INTO audit_trail(old_email, new_email)
  VALUES ('harold_gim@yahoo.com', 'hgimenez@hotmail.com'),
         ('hgimenez@hotmail.com', 'harold.gimenez@gmail.com'),
         ('harold.gimenez@gmail.com', 'harold@heroku.com'),
         ('foo@bar.com', 'bar@baz.com'),
         ('bar@baz.com', 'barbaz@gmail.com');


WITH RECURSIVE all_emails AS (
  SELECT  old_email, new_email
    FROM audit_trail
    WHERE old_email = 'harold_gim@yahoo.com'
  UNION
  SELECT at.old_email, at.new_email
    FROM audit_trail at
    JOIN all_emails a
      ON (at.old_email = a.new_email)
)
SELECT * FROM all_emails;

        old_email         |        new_email
--------------------------+--------------------------
 harold_gim@yahoo.com     | hgimenez@hotmail.com
 hgimenez@hotmail.com     | harold.gimenez@gmail.com
 harold.gimenez@gmail.com | harold@heroku.com
(3 rows)

select old_email, new_email into iter1
from audit_trail where old_email = 'harold_gim@yahoo.com';
select * from iter1;
--       old_email       |      new_email
-- ----------------------+----------------------
--  harold_gim@yahoo.com | hgimenez@hotmail.com
-- (1 row)

select a.old_email, a.new_email into iter2
from audit_trail a join iter1 b on (a.old_email = b.new_email);
select * from iter2;
--       old_email       |        new_email
-- ----------------------+--------------------------
--  hgimenez@hotmail.com | harold.gimenez@gmail.com
-- (1 row)

select * from iter1 union select * from iter2;
--       old_email       |        new_email
-- ----------------------+--------------------------
--  hgimenez@hotmail.com | harold.gimenez@gmail.com
--  harold_gim@yahoo.com | hgimenez@hotmail.com
-- (2 rows)

As you can see the recursive code gives the result in right order, but the non-recursive code does not.
They both use union, why the difference?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
qed
  • 22,298
  • 21
  • 125
  • 196
  • There is no `ORDER BY` anywhere, so no. Even a bare `SELECT * FROM table` is not guaranteed to be ordered unless you have an `ORDER BY` clause. – Colonel Thirty Two Aug 12 '15 at 21:50
  • In this case how would `order by` be useful? You basically want to preserve the order in which the table "grows". – qed Aug 12 '15 at 21:52
  • 2
    @qed there is no order in which a "table grows". The data is not guaranteed to have any particular order in data pages. Which means the newer tuples are not guaranteed to always follow the older ones – zerkms Aug 12 '15 at 21:55
  • ok, but there must be a way to simulate the record of growth, right? – qed Aug 12 '15 at 21:56
  • Yep, create a sequence and put it in every new row (if you don't have it) – zerkms Aug 12 '15 at 21:57
  • Add an `id serial PRIMARY KEY` column to your table (which has other benefits) and use `ORDER BY id`. – Colonel Thirty Two Aug 12 '15 at 22:08

3 Answers3

23

Basically, your query is incorrect to begin with. Use UNION ALL, not UNION or you would incorrectly remove duplicate entries. (There is nothing to say the trail cannot switch back and forth between the same emails.) And UNION is highly likely to reorder rows.

The Postgres implementation for UNION ALL typically returns values in the sequence as appended - as long as you do not add ORDER BY at the end or do anything else with the result. But there is no formal guarantee, and since the advent of Parallel Append plans in Postgres 11, this can actually break. See this related post:

Be aware though, that each SELECT returns rows in arbitrary order unless ORDER BY is appended. There is no natural order in tables.

So this usually works:

SELECT * FROM iter1
UNION ALL  -- union all!
SELECT * FROM iter2;

To get a reliable sort order, and "simulate the record of growth", you can track levels like this:

WITH RECURSIVE all_emails AS (
   SELECT  *, 1 AS lvl
   FROM    audit_trail
   WHERE   old_email = 'harold_gim@yahoo.com'

   UNION ALL  -- union all!
   SELECT t.*, a.lvl + 1
   FROM   all_emails  a
   JOIN   audit_trail t ON t.old_email = a.new_email
)
TABLE  all_emails
ORDER  BY lvl;

db<>fiddle here
Old sqlfiddle

Aside: if old_email is not defined UNIQUE in some way, you can get multiple trails. You would need a unique column (or combination of columns) to keep it unambiguous. If all else fails you can (ab-)use the internal tuple ID ctid for the purpose of telling trails apart. But you should rather use your own columns. (Added example in the fiddle.)

Consider:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 3
    Do you have a citation for Postgres guaranteeing preserving order with union all? The docs don't seem to guarantee it and I need to know if I can rely on it.. – rrauenza Apr 17 '19 at 14:57
  • 1
    @rrauenza: I don't know of any official guarantee. It's an implementation detail. That's why I formulated: *"implementation for `UNION ALL` is guaranteed ..."*. I removed the "guarantee" now to better reflect that. (But I am pretty sure it's not going to change.) – Erwin Brandstetter Apr 17 '19 at 15:17
  • 1
    i would prefer to follow the standard and not coincidences (and today's implementation details - which could easily change tomorrow), and use explicit `ORDER BY` for the result. Also, another answer: https://dba.stackexchange.com/a/316835/72776 – maxkoryukov Jan 31 '23 at 07:14
5

Ordering is never preserved after any operation in any reasonable database. If you want the result set in a particular order, use ORDER BY. Period.

This is especially true after a UNION. UNION removes duplicates and that operation is going to change the ordering of the rows, in all likelihood.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for the dose of common sense. I think I'll just add an indicator column for my problem in order to force rows from the first table to be included (after a `limit ` op) – information_interchange Aug 01 '19 at 15:23
-1

Order is preserved if one can pass after all unions statement as below:

select "ClassName","SectionName","Students","OrderNo" from table
UNION 
select '----TOTAL----' as "ClassName",'----' as "SectionName",sum("Total Students"),9999 as "OrderNo" from table
ORDER BY "OrderNo"
Rohil Patel
  • 386
  • 3
  • 8