1

In my masters program I am trying to implement a decission tree. Therefore I at some point have a vector of sorted and unique values of all variable. e.g.

sorted_unique <- c(1, 3, 5, 7)

now in the next step I am looking for all splitting points - I want to obtain the mean value between all values in the original vector.

splits <- double(length(sorted_unique) - 1)

for (i in 1:length(splits)) {
  splits[i] <- mean(sorted_unique[i:(i+1)])
}

this indeed yields the desired

> splits
[1] 2 4 6

however since I have to use this procedure a lot of times, it is very interesting to me, if there is a more efficient way to implement this.

Kind regards

Duesser
  • 29
  • 7

2 Answers2

1

One option could be:

sapply(seq_along(sorted_unique), function(x) mean(sorted_unique[c(x, x + 1)]))[-length(sorted_unique)]

[1] 2 4 6
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
0

Taking into account this question:

how can i efficiently obtain a vector with values that are between the original vectors values?

And taking into account that you have (as a starting point) sorted vector of unique values, you can try this:

sorted_unique <- c(1, 3, 5, 7)
all_values <- sorted_unique[[1]]:sorted_unique[[length(sorted_unique)]]
between <- all_values[!all_values %in% sorted_unique]
gss
  • 1,334
  • 6
  • 11
  • This does not yield the desired result. I would like to get only one split point. If I input `sorted_unique <- seq(1,13,by = 4)` as an example i would get too many values in return. Also I am looking for an approach that works on double vectors also. – Duesser Nov 21 '21 at 11:51