2

According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint).

Sample definitions:

CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass, 
IN field1 integer) 
RETURNS TABLE(id integer, dist smallint)

CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass, 
IN field1 integer) 
RETURNS TABLE(id integer, dist smallint)

Both functions return the exact same number of rows. Sample result (will be always ordered by dist):

(49,0)
(206022,3)
(206041,3)
(92233,4)

Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:

For example:

SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)

Returns something like:

(49,0)      (49,0)
(206022,3)  (206022,3)
(206041,3)  (206041,3)
(92233,4)   (133,4)

Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:

1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4

despite the fact that for the 4th row, 92233!=133

Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:

SELECT COUNT(*)
FROM 
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE  function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;

I am using PostgreSQL 9.3.

Community
  • 1
  • 1
Alexandros
  • 2,160
  • 4
  • 27
  • 52
  • 1
    You have to use `select * from function1('tblp1','tblp2',49)` for set returning functions. Do not put them into the select list. –  Mar 02 '15 at 12:31
  • You added `(will be always ordered by dist)`. Are you sure, this doesn't open possibilities for *false positives*? – Erwin Brandstetter Mar 02 '15 at 12:50
  • @ErwinBrandstetter It is a topK query (ORDERED BY dist LIMIT k), so results inside the functions are ordered by dist. I want to check in the specific example that I get one answer with dist =1, two answers with dist =3 and one answer for dist =4 for k=4. Both functions operate on different tables (the 2nd works on optimized tables) and I want to make sure that both functions return the same dist for all k results. In our case the dist 1 and dist 3 results will be always be the same (there is not a other result with dist=3) but in the case of 4 I might get dist =4 for many results. – Alexandros Mar 02 '15 at 13:04
  • So, rows are returned in *matching* order? The rest is not important for the question. – Erwin Brandstetter Mar 02 '15 at 13:09
  • @a_horse_with_no_name: That's actually missing the point of the question. – Erwin Brandstetter Mar 02 '15 at 13:11

3 Answers3

1

The order in which the rows are returned from the functions is not guaranteed. If you can return the row_number() (rn in the below example) from the functions then:

select
    count(f1.dist is null or f2.dist is null or null) as diff_count
from
    function1('tblp1','tblp2',49) f1
    inner join
    function2('tblp1_v2','tblp2_v2',49) f2 using(rn)
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
1

Is there a way to get only the 2nd field of each function result, to batch compare them?

All of the following answers assume that rows are returned in matching order.

Postgres 9.3

With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:

SELECT count(*) AS mismatches
FROM  (
   SELECT function1('tblp1','tblp2',49) AS f1
        , function2('tblp1_v2','tblp2_v2',49) AS f2
   ) sub
WHERE  (f1).dist <> (f2).dist;  -- note the parentheses!

The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.

This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).

Postgres 9.4

WITH ORDINALITY to generate row numbers on the fly

You can use WITH ORDINALITY to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT list:

SELECT count(*) AS mismatches
FROM      function1('tblp1','tblp2',49)       WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE  f1.dist IS DISTINCT FROM f2.dist;

This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).

Related:

ROWS FROM to join sets row-by-row

SELECT count(*) AS mismatches
FROM   ROWS FROM (function1('tblp1','tblp2',49)
                , function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE  t.dist1 IS DISTINCT FROM t.dist2;

Related answer:

Aside:
EXECUTE FORMAT is not a set plpgsql functionality. RETURN QUERY is. format() is just a convenient function for building a query string, can be used anywhere in SQL or plpgsql.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Are you sure it works with PostgreSQL 9.3? I get a syntax error and I have seen it only on http://www.postgresql.org/docs/devel/static/functions-srf.html. – Alexandros Mar 02 '15 at 12:06
  • @Alexandros It is a 9.4+ feature – Clodoaldo Neto Mar 02 '15 at 12:12
  • @Alexandros As I understand even if it worked in 9.3 it would number a result set possibly not in the same order as inside the function hence the row numbers must be assigned inside the function. – Clodoaldo Neto Mar 02 '15 at 12:20
  • @Alexandros: Yes, sorry, `WITH ORDINALITY` is for pg 9.4, not 9.3. I added some more. – Erwin Brandstetter Mar 02 '15 at 12:43
  • @ClodoaldoNeto: `WITH ORDINALITY` number rows just like they are returned from the function. So yes, If the returned order of rows does not match, I see no way to fix this ex post and we would need to add row numbers *inside* each function and return them ... – Erwin Brandstetter Mar 02 '15 at 12:47
0

For future reference:

Checking difference in number of rows:

SELECT 
ABS(count(f1a.*)-count(f2a.*))  
FROM
(SELECT f1.dist, row_number()  OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a FULL JOIN 

(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
USING (rn);

Checking difference in dist for same ordered rows:

SELECT 
COUNT(*)  

FROM

(SELECT f1.dist, row_number()  OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a 
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
WHERE f1a.rn=f2a.rn
AND f1a.distance <> f2a.distance;

A simple OVER() might also work since results of the functions are already ordered but is added for extra check.

Alexandros
  • 2,160
  • 4
  • 27
  • 52