0

After looking at this question: Numeric comparison difficulty in R

I'm still stuck, because I'm depending on an equality comparison that is deep down in some function that I can't edit (or can I?)

I test in a local environment whether three numbers sum to one (sum(p)==1 --> TRUE), but when i pass this vector of three numbers to another function, a similar equality test is failing - which makes me think that the numbers are being changed as they pass from one function to the next - is this possible?

More detail: I'm trying to 'optimize' the prior probabilities that feed into a CART model, using an optimizer (dfoptim package, nmkb) to choose combinations of priors, sending them to the rpart package for model fitting, then the verification (rps function) package for scoring - but somewhere in the rpart package, my prior probabilities are throwing an error because rpart thinks that they don't sum to 1.

Here's a reproducible example:

require('rpart')
require('verification')
require('dfoptim')
data(iris)
set.seed(1)
tmp1 <- paste0(names(iris),collapse="+")
tmp2 <- gsub("\\+Species","",tmp1)
fmlatext <- paste0("Species~",tmp2)
tree <- rpart(as.formula(fmlatext),data=iris,method="class")
objfun <- function(priors,fmlatext,data){
  p <- priors/sum(priors) # turn arbitrary threesome into numbers that sum to 1
  p[1] <- 1-(sum(p)-p[1]) # ensure that numbers sum to 1
  print(c(p,sum(p)),digits=16)
  tree <- rpart(as.formula(fmlatext),data=data,parms=list(prior=p),
                method="class") 
  rpst <- rps(data$Species,predict(tree,data=data))
  return(rpst$rpss)
}
nlev <- nlevels(iris$Species)
guess <- seq(nlev)*10
lb <- rep(1,nlev)
ub <- rep(100,nlev)
bestpriors <- nmkb(par=guess,fn=objfun,lower=lb,upper=ub,
                   control=list(maximize=TRUE),fmlatext=fmlatext,data=iris)

Running this code gives me this output:

[1] 0.1666666666666667 0.3333333333333333 0.5000000000000000 1.0000000000000000
[1] 0.4353687449261023 0.2354416940871099 0.3291895609867877 1.0000000000000000
[1] 0.1224920651311070 0.5548713793562775 0.3226365555126156 1.0000000000000000
[1] 0.1268712138061573 0.2390044736120877 0.6341243125817551 1.0000000000000000
[1] 0.35141687748184969 0.57028058689316308 0.07830253562498726 1.00000000000000000
[1] 0.2997590406445614 0.5077659444797995 0.1924750148756391 1.0000000000000000
[1] 0.3598141573675122 0.4350423262345758 0.2051435163979119 0.9999999999999999
Error in get(paste("rpart", method, sep = "."), envir = environment())(Y,  : 
  Priors must sum to 1

In my real code, this happens inconsistently, depending on the data and guess value, but it does happen, and is a real pain.

How can I get past this error? Cheers, R

Community
  • 1
  • 1
RyanStochastic
  • 3,963
  • 5
  • 17
  • 24
  • What happens if you truncate your input (`p`) to, say, 3 or 4 digits of precision? Do the errors continue? – dg99 Nov 23 '13 at 00:21
  • what do you mean truncate? I've tried `p <- round(priors/sum(priors),2)` and still seen the error in practice. – RyanStochastic Nov 23 '13 at 00:36
  • Why not create an extra dummy variable , 'leftover<-1-sum(all_your_stuff)` – Carl Witthoft Nov 23 '13 at 00:55
  • Carl - I'm not sure how that would help - the problem is that i'm sending three numbers that DO sum to 1 [in the `objfun` environment] to `rpart`, but when `rpart` checks them, they no longer sum to 1. how would `leftover` help me? – RyanStochastic Nov 23 '13 at 00:59

1 Answers1

0

Potential Answer; not sure if it's robust or not yet, but I created this function and it's working for a few test cases that were failing for me before:

makeSumToOne <- function(vec) {
  p <- round(1024*vec/sum(vec),0)
  p[1] <- 1024-(sum(p)-p[1])
  p <- p/1024
  return(p)
}

and replace these lines in original code:

  p <- priors/sum(priors) # turn arbitrary threesome into numbers that sum to 1
  p[1] <- 1-(sum(p)-p[1]) # ensure that numbers sum to 1

with this one:

  p <- makeSumToOne(priors)

I was reading about precision and noted that 'powers of 2' came up quite often, so i thought working a 'power of 2' 2^10=1024 into my program might help...it's helping so far, but i doubt it's robust. I'm not counting this as answer unless no one comes up with a better one. (or an explanation of why this solution works, and solutions with the round, signif or other functions fail.)

RyanStochastic
  • 3,963
  • 5
  • 17
  • 24