0

I am quite new to R and I could not find an appropriate answer to my question, neither in the web nor in this forum. So I hope somebody can help me with clarification. I modelled some (simplified) regression trees with rpartexample regression tree, based on only 1 oder 2 predictors to analyse what connections are between them and the dependant variable. Here is the code I cosntructed after reading many threads and manuals. Maybe, it helps:

mydata <- GWM_regression
df <- data.frame(mydata)
set.seed(123)
rows <- sample(nrow(df))
dfshuffle <- df[rows, ]
set.seed(123)
dfshuffle03 <- na.exclude(subset(dfshuffle))
set.seed(123)
sample <- sample.int(n = nrow(dfshuffle03), size = floor(.9*nrow(dfshuffle03)), replace = F)
dfshuffle03_train <- dfshuffle03[sample, ]
dfshuffle03_test  <- dfshuffle03[-sample, ]
set.seed(123)
m3 <- rpart (Detection_rate ~ DIF_2018, data = dfshuffle03_train, 
         method="anova", control = rpart.control(minsplit=20, cp=0, maxdepth=3))
bestcp_m3 <- m3$cptable[which.min(m3$cptable[,"xerror"]),"CP"]
m3.pruned <- prune(m3, cp = bestcp_m3)
rpart.plot(m3.pruned, digits=3, extra=101, fallen.leaves=T, tweak=1, branch=1,
       varlen=-13)

The tree I got showed some terminal nodes, whose elements I would like to extract to plot them on a map. I used 90% of the data set for training purposes (maybe too much I know) so not alle elements of my original data are considered by the tree, but I need to know which ones are sorted in which subset.

Is there any chance, to extract these data (as csv. oder txt.) from a regression tree model?

  • Welcome to SO Lightning. Please share the code you tried so far. – Mike Poole Jan 14 '21 at 14:38
  • I am not sure I am following you. Are you trying to add an indicator variable showing which node each person ends up in? – itsMeInMiami Jan 14 '21 at 14:42
  • Thanky ou fro the welcome @Mike Poole. I added the code I used for constructing the regression tree. – lightning87 Jan 14 '21 at 15:05
  • @itsMeInMiami: No, this is not what I was trying to express. The regression tree sorts the elements (with the detection rate as dependant variable) in certain nodes in the end based on the DIF_sales as independant variable. I would like to know, which elements have ended up in which node. So I can see, which of the elements from my training data set were classified in the node with very high detection rate to investigate them further. Is it somehow understandable? Sorry for misleading you – lightning87 Jan 14 '21 at 15:12

0 Answers0