0

I have function with multiple RETURN QUERY statements. At the end I want to get only N random records. If I set LIMIT max_size to each RETURN QUERY statement at the end I'll get max_size * count of RETURN QUERY times.

Short version of my function:

CREATE OR REPLACE FUNCTION android_getproposedsongs_test(
                                          IN puserid varchar, max_size int)
    RETURNS TABLE(sid uuid, method text) AS
$BODY$
DECLARE
    songCount int;
BEGIN

RETURN QUERY 
    SELECT trackid as sid, lower('popular') as method
    FROM ratingrecord
    WHERE trackid NOT IN (
       SELECT trackid FROM ratingrecord WHERE userid = puserid)
    AND ratingrecord.rating > 0
    GROUP BY trackid
    HAVING SUM(rating) > 0
    ORDER BY SUM(rating) DESC
    LIMIT max_size;

CREATE TEMP TABLE recommended ON COMMIT DROP 
AS
SELECT trackid, lower('recommended')
FROM ratingrecord finalRate
    INNER JOIN 
    (
        SELECT otherRate.userid AS otherUserId
             , SUM(myRate.rating * otherRate.rating) as SumRating 
        FROM ratingrecord AS myRate
        INNER JOIN ratingrecord AS otherRate 
        ON myRate.trackid = otherRate.trackid
        WHERE myRate.userid = puserid AND myRate.userid != otherRate.userid         
        GROUP BY otherRate.userid
        HAVING SUM(myRate.rating * otherRate.rating) > 0
    )  AS userRelations 
    ON finalRate.userid = userRelations.otherUserId
WHERE finalRate.trackid NOT IN (SELECT trackid FROM ratingrecord
                                WHERE userid = puserid)
GROUP BY finalRate.trackid
HAVING SUM(finalRate.rating * userRelations.SumRating) > 0
ORDER BY SUM(finalRate.rating * userRelations.SumRating) DESC
LIMIT max_size;

RETURN QUERY SELECT * FROM recommended;

-- another RETURN QUERY statements

END;    
$BODY$ 
LANGUAGE plpgsql VOLATILE;

I want something like ORDER BY random() LIMIT max_size at the result set, but don't know where to place it.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
azhidkov
  • 1,047
  • 3
  • 14
  • 23

1 Answers1

0

Either you use UNION ALL to combine all subqueries in pure SQL and add ORDER BY random() LIMIT n once at the end.

Or keep track of the number of rows returned in PL/pgSQL like this:

CREATE OR REPLACE FUNCTION android_getproposedsongs_test(puserid varchar, max_size int)
  RETURNS TABLE(sid uuid, method text)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ct         int;
BEGIN
   RETURN QUERY
   SELECT ...        -- query 1
   LIMIT  max_size;  -- apply to each individual query!

   GET DIAGNOSTICS _ct = ROW_COUNT;
   IF _ct >= max_size THEN  -- "= max_size" really, but >= can't hurt
      RETURN;
   ELSE 
      max_size := max_size - _ct;
   END IF;

   RETURN QUERY
   SELECT ...        -- query 2
   LIMIT  max_size;  -- reduced LIMIT may lead to faster query plan

   GET DIAGNOSTICS _ct = ROW_COUNT;
   IF _ct >= max_size THEN  -- "= max_size" really, but >= can't hurt
      RETURN;
   ELSE 
      max_size := max_size - _ct;
   END IF;
   
   -- more RETURN QUERY statements ...
END 
$func$;

Related:

However, this only returns the first N rows, not a random selection. To achieve that, return all qualifying rows without limit and apply ORDER BY random() LIMIT n in the next step:

SELECT *
FROM   android_getproposedsongs_test('foo', 3)
ORDER  BY random() LIMIT 123;

To get truly random rows all (possibly many) candidates need an equal chance. It can be much cheaper to return the first N arbitrary rows.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228