The formula below is a special case of the Wasserstein distance/optimal transport when the source and target distributions, x
and y
(also called marginal distributions) are 1D, that is, are vectors.
where F^{-1} are inverse probability distribution functions of the cumulative distributions of the marginals u
and v
, derived from real data called x
and y
, both generated from the normal distribution:
import numpy as np
from numpy.random import randn
import scipy.stats as ss
n = 100
x = randn(n)
y = randn(n)
How can the integral in the formula be coded in python and scipy? I'm guessing the x and y have to be converted to ranked marginals, which are non-negative and sum to 1, while Scipy's ppf
could be used to calculate the inverse F^{-1}'s?