2

I am dealing with a problem when trying to normalize a function (so that it integrates to unity) - written in C++ - included in R via Rcpp. The C++-function looks like this:

// C++ - Function: Standard Gamma Kernel
// [[Rcpp::export]]
NumericVector g_hat(double b, NumericVector u, NumericVector x){
  int n_arg = x.size();
  int size = u.size();
  NumericVector g(n_arg);

  double integ = 0;

  for(int j = 0; j < n_arg; j++){
    for(int i = 0; i < size; i++){
      g[j] = g[j] + (1/(b*double(size)))*pow(u[i]/b,x[j]/b)*exp(-u[i]/b)/(tgamma(x[j]/b + 1));
    }
    integ += g[j];
  }

  integ = integ -0.5*g[n_arg - 1] -0.5*g[0];
  integ = integ*(x[n_arg-1] - x[0]);
  integ = integ/(double(n_arg));

  g = g/double(integ);
  return(g);
}

This function performs a kernel density estimation and a rough approximation of the integral. In the last few lines I use this approximation to rescale the function such that the function integrates approximately to 1.

The code is written to be readable not to be lean. Explicit typecasts have been performed to exclude classic rookie mistakes.

A plot of the function for given data (black) is presented here. The fact that it is visually indistinguishable from its R counterpart (the dbckden-function from the R-package evimax) (red) gave me some some confidence that I am on the right way.

enter image description here

**But now I run into a problem. ** Using the integrate() function in R gives me totally different results. Running this code

integrate(f = g_hat, u = data, b = 5.5, lower = 0, upper = 15)$value
integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 15)$value

yields 17.49247 for my "rescaled function" and 1 for the R-package version of the function, showing that my renormalization effort failed miserably.

> integrate(f = g_hat, u = data, b = 0.5, lower = 0, upper = 50)$value
[1] 17.49247
> integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 50)$value
[1] 1

Analysis of what went wrong wasn't fruitful. Changing the output of the g_hat() -function to return a double and showing the value of integ gives

> g_hat(x = grid, u = data, b = 0.5)
[1] 0.7254907

When I plug this value (0.72...) into the last few lines of the algorithm for g_hat like this:

g = g/double(0.7254907);
  return(g);

and performing the integration in R again, I get:

> integrate(f = g_hat, u = data, b = 0.5, lower = 0, upper = 50)$value
[1] 1.004675
> integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 50)$value
[1] 1

meaning that the approximate renormalization works quite well.

But why did it not work when using integ-variable as seen in the code above? Why do I need to plug the number into the function in an explicit manner?

Futhermore, why does the graphs of both functions are so close and yet integrate to completely different values? Is it the integrate() function in R? Or is it the Rcpp-specific NumericVector?

I'm very thankful for any adivse as a lot of debugging time went into this one so far.

(Details: Changing the upper and lower limit of the integration or subdivisions & rel.tol did not show any significant changes. The same holds true for changes in the parameter b.)

Thanks.

EDIT: As kindly hinted by Christoph I should add a reproducable example. Here is the C++-code. Running this in RStudio (please make sure you have library(Rcpp) somwhere.) As the data change (10 data points) the plot looks a bit different, as can be seen here. But the paradox remains exactly the same: the graphs look the same but the integral is totally off (by a factor of > 33). Again, manipulating subdivisions (from 22 to 10e7) or abs.tol (0.01 to 25) did not do the trick.

// This is the C++-code

#include <Rcpp.h>
#include <boost/math/special_functions/gamma.hpp>
using namespace Rcpp;

// C++ - Function: Standard Gamma Kernel
// [[Rcpp::export]]
NumericVector g_hat(double b, NumericVector u, NumericVector x){
  int n_arg = x.size();
  int size = u.size();
  NumericVector g(n_arg);

  // variable which approximates the area beneath the graph of the function g
  double integ = 0;

  // loop over domain of the function
  for(int j = 0; j < n_arg; j++){

    // loop over data
    for(int i = 0; i < size; i++){
      g[j] = g[j] + (1/(b*double(size)))*pow(u[i]/b,x[j]/b)*exp(-u[i]/b)/(tgamma(x[j]/b + 1));
    }
    integ += g[j];
  }

  // correction of the approximate integral (trapezoidal rule)
  integ = integ -0.5*g[n_arg - 1] -0.5*g[0];
  integ = integ*(x[n_arg-1] - x[0]);
  integ = integ/(double(n_arg));

  // normalization of g
  g = g/double(integ);
  return(g);
}

/*** R
# This is the R code
data <- c(1.98559739, 0.86798303, 0.48703074, 1.11475725, 1.69790403, 0.09901693, 0.21825991, 1.08029421, 0.60396438, 0.83915639)
grid <- seq(from = 0, to = 20, by = 0.01)

library(evmix)

# plot of the 2 function graphs
plot(grid,g_hat(x = grid, u = data, b = 0.5),type="l", main = "Comparison: g_hat (C++) and dbckden (R)",ylab="")
lines(grid,dbckden(x = grid, kerncentres = data, lambda = 0.5, bcmethod = "gamma1"),col="red")
legend("topright", legend=c("dbckden", "g_hat"), col = c("red","black"), lty = c(1,1))

# definite integrals
integrate(f = g_hat, u = data, b = 0.5, lower = 0, upper = 55)$value                                          # 33.58002
integrate(f = dbckden, kerncentres = data, lambda = 0.5, bcmethod = "gamma1", lower = 0, upper = 55)$value    # 1

*/
coatless
  • 20,011
  • 13
  • 69
  • 84
7shoe
  • 1,438
  • 1
  • 8
  • 12
  • Are you sure, the x-scale is counted correctly? By the way: Without a reproducible example, it might be hard to help... – Christoph Aug 15 '16 at 10:06
  • Thank you for the input, Christoph! – 7shoe Aug 15 '16 at 10:46
  • The reproducable example has been added. ... The x-scale is a sequence from nearly 0 to 20 (with a small step size). At each x (i.e. x[j]) I need to compute the function value g[j] which is again a sum over the data points (u[0],...,u[size - 1]). I clarified this by commenting the code further. The integ-variable only is a sum of the function values, so it gets updated at each iteration of the outer loop only, ! – 7shoe Aug 15 '16 at 10:57
  • 2
    That three to four screenfuls of code of your you expect us to wade through and inspect -- I doubt this will happen. Please try to reduce it to [minimally reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) as we are all busy too. – Dirk Eddelbuettel Aug 15 '16 at 16:59

0 Answers0