0

I am implementing some preprocessing of variables in the context of a paper called A Neural Bayesian Estimator for Conditional Probability Densities.

It states: 1.) Given a non-linear, a monotonous variable transformation F:t->s such that s is distributed uniformly is applied. This can be achieved as mentioned in the paper by:

>>> sorting the target vector in ascending order
>>> fitting the spline to data, e.g. using interpolate from scipy

2.) Afterwards s is scaled to lie between -1 and 1. This can be achieved with interp:

>>> from numpy import interp
>>> interp(256,[1,512],[5,10])

3.) Finally, the flat distributions need to be converted into Gaussian distributions, centered at zero with std 1.

While the first two steps are clear on how to implement, I am struggling with the third one.

Regarding 3.), the author further states that the inverse of the integrated X^2 (X...chi) function may be used. Are there any libraries, preferably Python, suitable for this job?

Update 1:

After reading the paper again, it seems that X^2 doesn't directly relate to chi, but is rather calculated as follows:

X^2 = P*(1-o)^2+(1-P)*((-1)-o)^2

with P as the purity (can be computed easily given some variable) and o the variable itself.

For a given s scaled to lie between -1 and 1, I may just calculated the integral with lower bound=-1 and upper bound=s and then getting the inverse of it.

Question: How to do this numerically?

Scholle
  • 1,521
  • 2
  • 23
  • 44

1 Answers1

1

If you mean X2 distribution with PDF as described here, then what you're looking is X2 CDF. It is expressed via incomplete Gamma function, see same reference, and you could use SciPy to compute it, this or that should fit the bill. Don't forget full Gamma-function in denominator.

To find inverse of the incomplete Gamma, you could look at inverse functions from SciPy: this or that.

Therefore, I don't believe you'll need all this interpolation stuff

UPDATE

That expression could be computed analytically, say, using online integrator like that. Just calculate the difference between resulat at upper bound and result at lower bound and you're set

UPDATE II

You have to set intervals yourself

Below is (absolutely untested!) code you may try to use. Note, I use generic root finding routine, though because integral is a polynomial, more optimal way might be to use polynomial roots from here, or even code it yourself - it is just a cubic equation

def intgrl(x):
    return x*(x*(3.0 + x - 6.0*p) + 3.0)/3.0

def CDF(x, norm):
    return (intgrl(x) - intgrl(-1.0))/norm

def f(x, norm, rn):
    return CDF(x, norm) - rn

norm = intgrl(1.0) - intgrl(-1.0)

rn = 0.12345
res = scipy.optimize.brentq(f, -1.0, 1.0, args=(norm, rn))

UPDATE III

Variable rn was defined to be some (random U(0,1)) number from 0 to 1.

from scipy.optimize import brentq
import numpy as np
import matplotlib.pyplot as plt

def denormPDF(x, p):
    return p*(1.0-x)**2 + (1.0-p)*((-1.0)-x)**2

def intgrl(x, p):
    return x*(x*(3.0 + x - 6.0*p) + 3.0)/3.0

def CDF(x, p, norm):
    return (intgrl(x, p) - intgrl(-1.0, p))/norm

def PDF(x, p, norm):
    return denormPDF(x, p)/norm

def f(x, p, norm, rn):
    return CDF(x, p, norm) - rn

p = 0.25

norm = intgrl(1.0, p) - intgrl(-1.0, p)

x = np.linspace(-1.0, 1.0, 100)
y = [PDF(x, p, norm) for x in x]
z = [CDF(x, p, norm) for x in x]

# plot PDF
plt.plot(x, y)
plt.show()

# plot CDF
plt.plot(x, z)
plt.show()

rn   = np.linspace(0.000001, 1.0-0.000001, 50)
iCDF = [brentq(f, -1.0, 1.0, args=(p, norm, rn)) for rn in rn]

# plot inverse CDF
plt.plot(rn, iCDF)
plt.show()
Community
  • 1
  • 1
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Looks promising! Where to set the upper/lower bound in the online calculator? Regarding the calculation of the difference to get the inverse of the integral: You probably mean calculating the integral A from -1 to 1 and B from -1 to o (o=the variable of some value between -1 and 1), then the inverse C is C=A-B. Do u know any python lib coz I need to do this programatically. – Scholle Jul 01 '16 at 09:16
  • How is variable "rn" defined? I tested your code. It executes for x>=0, but throws a ValueError if x<0; e.g. assume p=0.25 (p is in [0,1]) and x=-0.5, which is a valid value due to scaling in step 2 of the preprocessing step. – Scholle Jul 04 '16 at 09:53
  • @Scholle ok, updated the code, and tested it, works for me. It plots PDF, CDF, and inverse CDF. Was on vacation, so ... – Severin Pappadeux Jul 14 '16 at 01:19