Methodology
As you pointed out, PostgreSQL is able to generate Uniform distribution using the random()
function.
A general answer to this kind question is Inverse Transform Sampling.
And the limitations of this methodology are:
That is, provided the Quantile Function is explicit and we are able to construct it with PostgreSQL mathematical functions then we can create a Pseudo Random Generator for a specific distribution using random()
as Uniform PRG.
Simple example: Exponential
Inverse Transform Sampling works well for Exponential Distribution:
CREATE OR REPLACE FUNCTION expon(N INTEGER, l FLOAT = 1)
RETURNS SETOF FLOAT AS
$BODY$
SELECT
-(1/l)*ln(1 - random())
FROM
generate_series(1, N) AS i;
$BODY$
LANGUAGE SQL;
This function generates N
samples drawn from an Exponential Distribution of parameter l
.
Lognormal
For Lognormal distribution the Quantile Function relies on the Error Function which is not implemented in PostgreSQL. Thus we need to implement the missing function (which is an integral, not impossible using WINDOWING functions but probably not the best idea) or find another way.
Fortunately, we can generate Normal Distribution samples using the Box-Muller Transform:
CREATE OR REPLACE FUNCTION norm(N INTEGER, mu FLOAT = 0, sigma FLOAT = 1)
RETURNS SETOF FLOAT AS
$BODY$
SELECT
sigma*sqrt(-2.*ln(random()))*cos(2*pi()*random()) + mu
FROM
generate_series(1, N) AS i;
$BODY$
LANGUAGE SQL;
The following call:
SELECT norm(10000);
Gives:

And MLE returns (mu=0.021131501222537274, sigma=1.0042820700537662)
not too bad, we may be on the good track.
Then we can take the exponential of this function:
CREATE OR REPLACE FUNCTION lognorm(N INTEGER, mu FLOAT = 0, sigma FLOAT = 1)
RETURNS SETOF FLOAT AS
$BODY$
SELECT
exp(x)
FROM
norm(N, mu, sigma) AS x;
$BODY$
LANGUAGE SQL;
And we have a PRG for Lognormal distribution.
The following call:
SELECT lognorm(10000);
Also gives acceptable result:

MLE returns (sigma=0.9996878296400589, loc=0.0, exp(mu)=1.0002728392916154)
.
Numerical Integration & Error Function
Although it might be not performant, it is quite easy to estimate the Error Function with PostgreSQL using Trapezoid Rule. Thought it is a naive implementation:
CREATE OR REPLACE FUNCTION erf(x FLOAT, dx NUMERIC = 1e-3)
RETURNS FLOAT AS
$BODY$
WITH
D AS (
SELECT
y::FLOAT,
exp(-((y::FLOAT)^2)) AS fx0,
LEAD(exp(-((y::FLOAT)^2))) OVER(ORDER BY y) AS fx1
FROM
generate_series(0, x::NUMERIC, dx) AS y
)
SELECT
COALESCE((2/sqrt(pi()))*SUM((D.fx1 + D.fx0)*dx::FLOAT/2), 0.)
FROM D;
$BODY$
LANGUAGE SQL IMMUTABLE;
And if we compare the result with exact form (Python, scipy), it looks not too bad we get at least 6 significant figures:
x psql scipy errabs errrel
0 0.0 0.000000 0.000000 0.000000e+00 NaN
5 0.5 0.520500 0.520500 -7.323189e-08 -1.406953e-07
10 1.0 0.842701 0.842701 -6.918458e-08 -8.209863e-08
15 1.5 0.966105 0.966105 -2.973257e-08 -3.077571e-08
20 2.0 0.995322 0.995322 -6.888995e-09 -6.921371e-09
25 2.5 0.999593 0.999593 -9.076190e-10 -9.079885e-10
30 3.0 0.999978 0.999978 -6.962642e-11 -6.962795e-11
35 3.5 0.999999 0.999999 -3.149592e-12 -3.149594e-12
40 4.0 1.000000 1.000000 -8.404388e-14 -8.404388e-14
45 4.5 1.000000 1.000000 1.110223e-16 1.110223e-16
50 5.0 1.000000 1.000000 2.442491e-15 2.442491e-15

So we could use erf
function to perform Inverse Transform Sampling for Normal and Lognormal as we did for Exponential, but I might be a bad idea. It should perform poorly due to Algorithm Complexity and Integration Inaccuracies.
Beta
Unfortunately the Inverse Transform Sampling does not seem to suit for Beta Distribution because the Quantile Function is not expressible as a simple function: it requires to get the inverse of the Regularized Incomplete Beta Function. I don't know if it is possible: Wikipedia does not have Quantile Function referenced for Beta distribution.
For this case you may need to compile the function in some Programming Language (such as C/C++) and bind it to a PostgreSQL function as @Nick Barnes suggested in his comment.
Technical Considerations
As @Nick Barnes pointed out in his comments:
- Function using
random()
are not IMMUTABLE
(they are VOLATILE
the default) because they change seed value of the PostgreSQL PRG;
- Current implementations presented here are naive, they do not handle edge cases such as
ln(0.)
;
- Functions in
LANGUAGE SQL
usually perform well (although we must consider their complexities and their convergences);
- Returning
SETOF FLOAT
is better than using FLOAT[]
and avoid the need to unnest()
, as I did in previous versions of the SQL Functions;
- Limiting cast such as
::FLOAT
whenever it is possible;
- There is a function
pi()
no need to evaluate it with 2.*acos(0.)
.