OR-Tools solution to partition the data so that subset of rows where every feature falls in the corresponding range maximizes the objective function

Question

Cross-posted from https://cs.stackexchange.com/questions/153558/find-a-range-of-values-to-subset-the-rows-to-maximize-the-objective-function?noredirect=1#comment323025_153558.

I have searched around for some time but couldn't find a similar example to my problem. It looks common enough that I would expect it to be solved. It lies between search and optimization/regression. The goal is to find a range of values for each feature, so that the subset of rows where every feature falls in the corresponding range maximizes the objective function. Assume we have a matrix with Yi and corresponding set of features Xi (say around 40). Number of samples relatively large, 100k+. Table example

So in this case for the total data Sum(Y_i) = 73 and the mean(Y_i)= 6.0833

The problem is to:

Max sum(Yi) subj to:

mean(Y_i) > 7$

sum(i) > 5000

, where i are the row index and rows are selected by imposing 2 constraints ( < and > ) or each feature.

I have managed to get solution using DEoptim in R for 5-6 variables with 2 conditions (partitions) "<" and ">". For more features it gets slow/fail to converge.

Seeing the (somewhat) similar question (and answer) here : Pandas find subset of rows minimizing the sum of a column under other column constraint

I am wondering if there is a way to formulate my problem in OR-Tools as well. I have went through the documentation on the https://developers.google.com/optimization but still struggle to understand how to express my problem.

Would appreciate any pointers as to how to formulate (solve) this problem in OR-tools in the general case, where there is a dataset with features + response variable and the objective is find the splits on features to maximize (minimize) the sum (or other function) of the response variable. The number of splits should be 2 per feature as we want solution to be locally monotonic wrt to features.

Thanks.

Perhaps useful https://stackoverflow.com/questions/63377180/finding-ideal-filter-setting-to-maximize-target-function/63429510#63429510 — Enrico Schumann, Aug 22 '22 at 13:06
Most likely, the technique to use will be a MIP solver. There should be plenty of literature on this if it can be applied. — Laurent Perron, Aug 22 '22 at 17:08
this seems deterministic: order by `Y` take all values larger than 7 and keep adding until mean falls below 7. If length of selected values is higher than 5000 you have global maximum otherwise there is no feasible solution. — det, Aug 26 '22 at 13:26

OR-Tools solution to partition the data so that subset of rows where every feature falls in the corresponding range maximizes the objective function

0 Answers0