I have an unknown continuous probability distribution p(x)
that is expensive to sample from, but cheap to evaluate, and I would like to estimate its differential entropy. Some other details that might not matter are that x
is 9 dimensional, and that the distribution is likely multi-modal with an unknown number of modes. I would prefer a solution in python, ideally that is pytorch compatible.
Currently, I have a number (~1000) of x
samples proposed from some distribution that is cheap to sample and evaluate (e.g. uniform or Gaussian), and I can evaluate each p(x)
easily. I roughly know the bounds of where p(x)
are "high". My idea for evaluating the entropy is either:
- fit a GMM to the weighted samples, then estimate the entropy of the GMM
- duplicate the sampled
x
according to their probability, then estimate the entropy of the samples using KDE methods
For option 1, I would like to not specify the number of GMM components.
sklearn
has the Dirichlet Process Gaussian Mixture Model which has the intended behavior, but there is no API to fit to weighted samples. There is an open pull request for doing so: https://github.com/scikit-learn/scikit-learn/pull/17130- This standalone repository https://github.com/ktrapeznikov/dpgmm may be what I need - I will update this question after testing it (edit: it's out of date and refers to sklearn internals so it is not usable)
pomegranate
mentioned in a related question: python Fitting weighted data with Gaussian mixture model (GMM) with minimum on covariance seems to have weighted data fitting, but there were major API changes and missing tutorials since 1.0, and there doesn't seem to be an easy way to not set the number of components
For option 2, I will have extra hyperparameters to play around with since some of the sampled x
have very low probability. The difference between the highest and lowest p(x)
may be a factor of 10000
, so finding the greatest common denominator and using that as a weight of 1 (having 1 copy) is likely not feasible. I would need a cutoff p(x)
, and even in that case I would increase the size of the set of samples significantly.
scipy.stats
has differential entropy estimation from samples- manual histogram approaches may not be feasible due to
x
being 9 dimensional - lots of options and papers show up in a search, but few implementations
Are option 1 and option 2 equally valid? My intuition is that a GMM can fit p(x)
reasonably well. Do you have any suggestions for implementations for option 1 or 2?