Cross-posted from https://cs.stackexchange.com/questions/153558/find-a-range-of-values-to-subset-the-rows-to-maximize-the-objective-function?noredirect=1#comment323025_153558.
I have searched around for some time but couldn't find a similar example to my problem. It looks common enough that I would expect it to be solved. It lies between search and optimization/regression. The goal is to find a range of values for each feature, so that the subset of rows where every feature falls in the corresponding range maximizes the objective function. Assume we have a matrix with Yi and corresponding set of features Xi (say around 40). Number of samples relatively large, 100k+. Table example
So in this case for the total data Sum(Y_i) = 73 and the mean(Y_i)= 6.0833
The problem is to:
Max sum(Yi) subj to:
mean(Y_i) > 7$
sum(i) > 5000
, where i are the row index and rows are selected by imposing 2 constraints ( < and > ) or each feature.
I have managed to get solution using DEoptim in R for 5-6 variables with 2 conditions (partitions) "<" and ">". For more features it gets slow/fail to converge.
Seeing the (somewhat) similar question (and answer) here : Pandas find subset of rows minimizing the sum of a column under other column constraint
I am wondering if there is a way to formulate my problem in OR-Tools as well. I have went through the documentation on the https://developers.google.com/optimization but still struggle to understand how to express my problem.
Would appreciate any pointers as to how to formulate (solve) this problem in OR-tools in the general case, where there is a dataset with features + response variable and the objective is find the splits on features to maximize (minimize) the sum (or other function) of the response variable. The number of splits should be 2 per feature as we want solution to be locally monotonic wrt to features.
Thanks.