1

Our new database does not (and will not) support PL/R usage, which we rely on extensively to implement a random weighted sample function:

CREATE OR REPLACE FUNCTION sample(
    ids bigint[],
    size integer,
    seed integer DEFAULT 1,
    with_replacement boolean DEFAULT false,
    probabilities numeric[] DEFAULT NULL::numeric[])
    RETURNS bigint[]
    LANGUAGE 'plr'

    COST 100
    VOLATILE 
AS $BODY$
    set.seed(seed)
    ids = as.integer(ids)
    if (length(ids) == 1) {
        s = rep(ids,size)
    } else {
        s = sample(ids,size, with_replacement,probabilities)
    }
    return(s)
$BODY$;

Is there a purely SQL approach to this same function? This post shows an approach that selects a single random row, but does not have the functionality of sampling multiple groups at once.

As far as I know, SQL Fiddle does not support PLR, so see below for a quick replication example:

CREATE TABLE test
    (category text, uid integer, weight numeric)
;
    
INSERT INTO test
    (category, uid, weight)
VALUES
    ('a', 1,  45),
    ('a', 2,  10),
    ('a', 3,  25),
    ('a', 4,  100),
    ('a', 5,  30),
    ('b', 6, 20),
    ('b', 7, 10),
    ('b', 8, 80),
    ('b', 9, 40),
    ('b', 10, 15),
    ('c', 11, 20),
    ('c', 12, 10),
    ('c', 13, 80),
    ('c', 14, 40),
    ('c', 15, 15)
;

SELECT category,
        unnest(diffusion_shared.sample(array_agg(uid ORDER BY uid),
                                       1,
                                       1,
                                       True,
                                       array_agg(weight ORDER BY uid))
                                       ) as uid
FROM test
WHERE category IN ('a', 'b')
GROUP BY category;

Which outputs:

category  uid
'a'       4
'b'       8

Any ideas?

Kevin M
  • 801
  • 3
  • 9
  • 14

0 Answers0