I am working with R. I am trying to follow this tutorial over here on function optimization: https://rpubs.com/Argaadya/bayesian-optimization
For this example, I first generate some random data:
#load libraries
library(dplyr)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
From here, I define the function that I want to optimize ("fitness"). This function takes 7 inputs and calculates a "total" mean (a single scalar value). The inputs required for this function are:
- "random_1" (between 80 and 120)
- "random_2" (between "random_1" and 120)
- "random_3" (between 85 and 120)
- "random_4" (between random_2 and 120)
- "split_1" (between 0 and 1)
- "split_2" (between 0 and 1)
- "split_3" (between 0 and 1)
The function to optimize ("fitness") is defined as follows:
#define fitness function : returns a single scalar value called "total"
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {
#bin data according to random criteria
train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_1)))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_2)))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_3)))
#create a new variable ("diff") that measures if the quantile is bigger than the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
mean = mean(final_table$diff)
}
**Goal:**I now want to use the "bayesian optimization" algorithm from this tutorial (https://rpubs.com/Argaadya/bayesian-optimization). The objective is to find the value of these 7 numbers that produces the biggest value of "mean":
First, define the "search bound":
#define search bound for the 7 inputs
library(rBayesianOptimization)
random_1 = NULL
random_2 = NULL
random_3 = NULL
random_4 = NULL
split_1 = NULL
split_2 = NULL
split_3 = NULL
search_bound <- list(random_1 = c(80,120), random_2 = c(random_1,120),
random_3 = c(85,120), random_4 = c(random_2, 120), split_1 = c(0,1), split_2 = c(0,1), split_3 = c(0,1))
Second, set the initial sample:
#set initial sample:
set.seed(123)
search_grid <- data.frame(random_1 = runif(20,80,120),
random_2 = runif(20,random_1,120),
random_3 = runif(20,85,120),
random_4 = runif(20,random_2,120),
split_1= runif(20,0,1),
split_2 = runif(20,0,1),
split_3 = runif(20,0,1)
)
Finally, run the Bayesian Optimization algorithm:
#run the bayesian optimization algorithm:
set.seed(1)
bayes_finance_ei <- BayesianOptimization(FUN = fitness, bounds = search_bound,
init_grid_dt = search_grid, init_points = 0,
n_iter = 10, acq = "ei")
But this produces the following error:
Error in FUN(X[[i]], ...) : subscript out of bounds
Can someone please show me what I am doing wrong? I thought I followed all necessary steps from the tutorial?
Thanks