import typing
import numpy as np
import scipy.stats
def run_gaussian_copula_simulation_and_get_samples(
ppfs: typing.List[typing.Callable[[np.ndarray], np.ndarray]], # List of $num_dims percentile point functions
cov_matrix: np.ndarray, # covariance matrix, shape($num_dims, $num_dims)
num_samples: int, # number of random samples to draw
) -> np.ndarray:
num_dims = len(ppfs)
# Draw random samples from multidimensional normal distribution -> shape($num_samples, $num_dims)
ran = np.random.multivariate_normal(np.zeros(num_dims), cov_matrix, (num_samples,), check_valid="raise")
# Transform back into a uniform distribution, i.e. the space [0,1]^$num_dims
U = scipy.stats.norm.cdf(ran)
# Apply ppf to transform samples into the desired distribution
# Each row of the returned array will represent one random sample -> access with a[i]
return np.array([ppfs[i](U[:, i]) for i in range(num_dims)]).T # shape($num_samples, $num_dims)
# Example 1. Uncorrelated data, i.e. both distributions are independent
f1 = run_gaussian_copula_simulation_and_get_samples(
[lambda x: scipy.stats.norm.ppf(x, loc=100, scale=15), scipy.stats.norm.ppf],
[[1, 0], [0, 1]],
6
)
# Example 2. Completely correlated data, i.e. both percentiles match
f2 = run_gaussian_copula_simulation_and_get_samples(
[lambda x: scipy.stats.norm.ppf(x, loc=100, scale=15), scipy.stats.norm.ppf],
[[1, 1], [1, 1]],
6
)
np.set_printoptions(suppress=True) # suppress scientific notation
print(f1)
print(f2)
A few note on this function. np.random.multivariate_normal
does a lot of the heavy lifting for us, note that in particular we do not need to decompose the correlation matrix.
ppfs
is passed as a list of functions which each have one input and one return value.
In my particular use case I needed to generate multivariate-t-distributed random variables (in addition to normal-distributed ones),
consult this answer on how to do that: https://stackoverflow.com/a/41967819/2111778.
Additionally, I used scipy.stats.t.cdf
for the back-transform part.
In my particular use case the desired distributions were empirical distributions representing expected financial loss.
The final data points then had to be added together to get a total financial loss across all
of the individual-but-correlated financial events.
Thus, np.array(...).T
is actually replaced by sum(...)
in my code base.