I'm currently working with two student colleagues with the optimization package pymoo. We have searched in the documentation but we are still strugling to solve some issues. The main challenge is to define and implement our problem. This problem consist on an optimization of a protein based classifier.
Our aim is to minimize the number of proteins and maximize the accuracy of the classifier(objective function). We already implemented the optimization in matlab, but we are whiling to replicate it on Python.
In matlab we have:
X is a mxn feature matrix(m=samples, n=features) Y is a multiclass vector(classes: 1,2,3,4)
[m,n]=size(X);
folds=5;
Indices=k_fold(folds,m);
first, we defined the option parameters for the problem which are: multiobjective approach, population size of 50 and 5 generations. This problem is unconstrained.
options = optimoptions('gamultiobj','PopulationType','bitstring','PopulationSize',50,'PlotFcn',@gaplotpareto,'UseParallel',true,'Display','iter','MaxGenerations',5);
fcn is the Objective function which receives the normalized feature matrix Xn and Y (indices is related to the Kfold-cross validation)
fcn=@(Sol)ObjFunc(Sol,Xn,Y,Indices);
In this way we can get this ouput of the optimization including: BestSol is a vector with the Best solution found. Fval is a 2 column vector matching the best solution according to the number of features used to classify.
[BestSol,Fval] = gamultiobj(fcn,n,[],[],[],[],[],[],options);
The aim is to get a binary vector (0 and 1) indicating the best features (for the optimized classifier)
The whole first part of the code is:
[m,n]=size(X);
folds=5;
Indices=k_fold(folds,m);
options = optimoptions('gamultiobj','PopulationType','bitstring','PopulationSize',50,'PlotFcn',@gaplotpareto,'UseParallel',true,'Display','iter','MaxGenerations',5);
fcn=@(Sol)ObjFunc(Sol,Xn,Y,Indices);
[BestSol,Fval] = gamultiobj(fcn,n,[],[],[],[],[],[],options);
In python
- We don't fully understand how to define our problem class properly. We have tried to follow some of the examples, but we don't see very clear how to define the objective function according to our case.
this is the example from pymoo docs https://pymoo.org/problems/index.html#
class SphereWithConstraint(Problem):
def __init__(self):
super().__init__(n_var=10, n_obj=1, n_constr=1, xl=0, xu=1)
def _evaluate(self, x, out, *args, **kwargs):
out["F"] = np.sum((x - 0.5) ** 2, axis=1)
out["G"] = 0.1 - out["F"]
in our case would be somethin like this:
import numpy as np
from pymoo.model.problem import Problem
class ProteinClassifier(Problem):
def __init__(self):
super().__init__(n_var= *columns of X* , n_obj=2, n_constr=0)
def _evaluate(self, x, out, *args, **kwargs):
out["F"] = ??????????
out["G"] = ??????????
in this example they created a binary single objective optimization https://pymoo.org/customization/binary_problem.html:
import numpy as np
from pymoo.algorithms.so_genetic_algorithm import GA
from pymoo.factory import get_crossover, get_mutation, get_sampling
from pymoo.optimize import minimize
from pymoo.problems.single.knapsack import create_random_knapsack_problem
problem = create_random_knapsack_problem(30)
algorithm = GA(
pop_size=200,
sampling=get_sampling("bin_random"),
crossover=get_crossover("bin_hux"),
mutation=get_mutation("bin_bitflip"),
eliminate_duplicates=True)
res = minimize(problem,
algorithm,
('n_gen', 100),
verbose=False)
print("Best solution found: %s" % res.X.astype(int))
print("Function value: %s" % res.F)
print("Constraint violation: %s" % res.CV)
we want to make somethin similar to this, but for multiobjective:
Best solution found: [1 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0]
Function value: [-686]
Constraint violation: [0.]
Having a look in the github source code we found this, and it is not clear where they defined the x the code is supposing to receive as argument in _evaluate (is supposed to be a decorator in the problem class):
def _evaluate(self, x, out, *args, **kwargs):
out["F"] = -anp.sum(self.P * x, axis=1)
out["G"] = (anp.sum(self.W * x, axis=1) - self.C)
the problem source code works like this:
def evaluate(self,
X,
*args,
return_values_of="auto",
return_as_dictionary=False,
**kwargs):
"""
Evaluate the given problem.
The function values set as defined in the function.
The constraint values are meant to be positive if infeasible. A higher positive values means "more" infeasible".
If they are 0 or negative, they will be considered as feasible what ever their value is.
but a decorator is then defined:
@abstractmethod
def _evaluate(self, x, out, *args, **kwargs):
pass
The method def _evaluate(self, x, out, *args, **kwargs):
receives the argument x, but we haven't found where is defined and called. For this reason, we can't execute our objective function, or even put the feature matrix as the input.
-How and where can we define the objective function?
-How does the x is called and handle in the examples?
- How to choose the best parameters for the GA object?
We've been stucked in this point for last two weeks. We really appreciate your help