0

This question is about ctree in the partykit package of R. Building regression tree on a continuous response y with 28 continuous predictors x_1, x_2,...., x_28, I ran into an error which probably came from the splitting algorithm in ctree.

library(partykit)
ctrl1 = partykit::ctree_control(minsplit=100, minbucket=100, minprob=0.1, maxdepth=Inf)
dc3 = partykit::ctree(y~., data=as.data.frame(X3), control=ctrl1)

Error in interval.numeric(x, breaks = c(xmin - tol, ux, xmax)) : 
'breaks' are not unique

My predictors have relatively well behaved distribution (histograms attached).

Question is, what can I do about it?

EDIT : Comparing decision tree algorithms, I tested this data on the ctree from party package, no issue at all. I also put it through rpart, again no problem.

library(rpart)
library(party)
ctrl11 = party::ctree_control(minsplit=100,                             minbucket=100, maxdepth=0)
dc31 = party::ctree(y~., data=as.data.frame(X3), control=ctrl11)
X11();plot(dc31)

rpart.cont = list(maxdepth=3, usesurrogate=0, xval=0, cp=0.001, minsplit=10, minbucket=10)
dc3 = rpart(y~., data=as.data.frame(X3),method="anova", model=TRUE, control=rpart.cont)

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

enter image description here

horaceT
  • 621
  • 13
  • 26
  • 1
    https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – AidanGawronski Dec 18 '18 at 17:14
  • Your session info in this case is unlikely to be helpful. Instead please try to find a small subsample of your data that still leads to this issue and then share the subsample via `dput`. – Julius Vainora Dec 18 '18 at 17:25
  • @AidanGawronski Added platform info. The code are as presented. The data is a bit too much to paste here. Anything else? – horaceT Dec 18 '18 at 17:25
  • @JuliusVainora Well, I take the first 1000 rows, no error. Gradually adding more rows, error doesn't show up until 2000. So i think it's really about the behavior of some of the covariates. – horaceT Dec 18 '18 at 17:28
  • By considering subsamples of different rows **and** covariates you are quite likely to find a solution yourself or at least to narrow it down enough to share the data... – Julius Vainora Dec 18 '18 at 17:32
  • dput the 2000 rows? that's not that much data. – AidanGawronski Dec 18 '18 at 17:49
  • Hi, the same happened to me. The problem are floating point issues. In my first dataset I had computed some fractions for some predictors. Afterwards the error occured. If I round these fractions to e.g. 10 digits, the error disappears. – JBJ Jan 10 '19 at 11:46

0 Answers0