I am dealing with a problem when trying to normalize a function (so that it integrates to unity) - written in C++ - included in R via Rcpp. The C++-function looks like this:
// C++ - Function: Standard Gamma Kernel
// [[Rcpp::export]]
NumericVector g_hat(double b, NumericVector u, NumericVector x){
int n_arg = x.size();
int size = u.size();
NumericVector g(n_arg);
double integ = 0;
for(int j = 0; j < n_arg; j++){
for(int i = 0; i < size; i++){
g[j] = g[j] + (1/(b*double(size)))*pow(u[i]/b,x[j]/b)*exp(-u[i]/b)/(tgamma(x[j]/b + 1));
}
integ += g[j];
}
integ = integ -0.5*g[n_arg - 1] -0.5*g[0];
integ = integ*(x[n_arg-1] - x[0]);
integ = integ/(double(n_arg));
g = g/double(integ);
return(g);
}
This function performs a kernel density estimation and a rough approximation of the integral. In the last few lines I use this approximation to rescale the function such that the function integrates approximately to 1.
The code is written to be readable not to be lean. Explicit typecasts have been performed to exclude classic rookie mistakes.
A plot of the function for given data (black) is presented here. The fact that it is visually indistinguishable from its R counterpart (the dbckden
-function from the R-package evimax
) (red) gave me some some confidence that I am on the right way.
**But now I run into a problem. **
Using the integrate()
function in R gives me totally different results. Running this code
integrate(f = g_hat, u = data, b = 5.5, lower = 0, upper = 15)$value
integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 15)$value
yields 17.49247 for my "rescaled function" and 1 for the R-package version of the function, showing that my renormalization effort failed miserably.
> integrate(f = g_hat, u = data, b = 0.5, lower = 0, upper = 50)$value
[1] 17.49247
> integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 50)$value
[1] 1
Analysis of what went wrong wasn't fruitful. Changing the output of the g_hat()
-function to return a double
and showing the value of integ
gives
> g_hat(x = grid, u = data, b = 0.5)
[1] 0.7254907
When I plug this value (0.72...) into the last few lines of the algorithm for g_hat
like this:
g = g/double(0.7254907);
return(g);
and performing the integration in R again, I get:
> integrate(f = g_hat, u = data, b = 0.5, lower = 0, upper = 50)$value
[1] 1.004675
> integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 50)$value
[1] 1
meaning that the approximate renormalization works quite well.
But why did it not work when using integ
-variable as seen in the code above? Why do I need to plug the number into the function in an explicit manner?
Futhermore, why does the graphs of both functions are so close and yet integrate to completely different values? Is it the integrate()
function in R? Or is it the Rcpp-specific NumericVector
?
I'm very thankful for any adivse as a lot of debugging time went into this one so far.
(Details: Changing the upper
and lower
limit of the integration or subdivisions
& rel.tol
did not show any significant changes. The same holds true for changes in the parameter b
.)
Thanks.
EDIT: As kindly hinted by Christoph I should add a reproducable example.
Here is the C++-code. Running this in RStudio (please make sure you have library(Rcpp)
somwhere.) As the data change (10 data points) the plot looks a bit different, . But the paradox remains exactly the same: the graphs look the same but the integral is totally off (by a factor of > 33). Again, manipulating
subdivisions
(from 22 to 10e7) or abs.tol
(0.01 to 25) did not do the trick.
// This is the C++-code
#include <Rcpp.h>
#include <boost/math/special_functions/gamma.hpp>
using namespace Rcpp;
// C++ - Function: Standard Gamma Kernel
// [[Rcpp::export]]
NumericVector g_hat(double b, NumericVector u, NumericVector x){
int n_arg = x.size();
int size = u.size();
NumericVector g(n_arg);
// variable which approximates the area beneath the graph of the function g
double integ = 0;
// loop over domain of the function
for(int j = 0; j < n_arg; j++){
// loop over data
for(int i = 0; i < size; i++){
g[j] = g[j] + (1/(b*double(size)))*pow(u[i]/b,x[j]/b)*exp(-u[i]/b)/(tgamma(x[j]/b + 1));
}
integ += g[j];
}
// correction of the approximate integral (trapezoidal rule)
integ = integ -0.5*g[n_arg - 1] -0.5*g[0];
integ = integ*(x[n_arg-1] - x[0]);
integ = integ/(double(n_arg));
// normalization of g
g = g/double(integ);
return(g);
}
/*** R
# This is the R code
data <- c(1.98559739, 0.86798303, 0.48703074, 1.11475725, 1.69790403, 0.09901693, 0.21825991, 1.08029421, 0.60396438, 0.83915639)
grid <- seq(from = 0, to = 20, by = 0.01)
library(evmix)
# plot of the 2 function graphs
plot(grid,g_hat(x = grid, u = data, b = 0.5),type="l", main = "Comparison: g_hat (C++) and dbckden (R)",ylab="")
lines(grid,dbckden(x = grid, kerncentres = data, lambda = 0.5, bcmethod = "gamma1"),col="red")
legend("topright", legend=c("dbckden", "g_hat"), col = c("red","black"), lty = c(1,1))
# definite integrals
integrate(f = g_hat, u = data, b = 0.5, lower = 0, upper = 55)$value # 33.58002
integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 55)$value # 1
*/