0

I'm currently working with two student colleagues with the optimization package pymoo. We have searched in the documentation but we are still strugling to solve some issues. The main challenge is to define and implement our problem. This problem consist on an optimization of a protein based classifier.

Our aim is to minimize the number of proteins and maximize the accuracy of the classifier(objective function). We already implemented the optimization in matlab, but we are whiling to replicate it on Python.

In matlab we have:

X is a mxn feature matrix(m=samples, n=features) Y is a multiclass vector(classes: 1,2,3,4)

[m,n]=size(X);
folds=5;
Indices=k_fold(folds,m);

first, we defined the option parameters for the problem which are: multiobjective approach, population size of 50 and 5 generations. This problem is unconstrained.

options = optimoptions('gamultiobj','PopulationType','bitstring','PopulationSize',50,'PlotFcn',@gaplotpareto,'UseParallel',true,'Display','iter','MaxGenerations',5); 

fcn is the Objective function which receives the normalized feature matrix Xn and Y (indices is related to the Kfold-cross validation)

fcn=@(Sol)ObjFunc(Sol,Xn,Y,Indices);

In this way we can get this ouput of the optimization including: BestSol is a vector with the Best solution found. Fval is a 2 column vector matching the best solution according to the number of features used to classify.

[BestSol,Fval] = gamultiobj(fcn,n,[],[],[],[],[],[],options);

The aim is to get a binary vector (0 and 1) indicating the best features (for the optimized classifier)

The whole first part of the code is:

[m,n]=size(X);
folds=5;
Indices=k_fold(folds,m);
options = optimoptions('gamultiobj','PopulationType','bitstring','PopulationSize',50,'PlotFcn',@gaplotpareto,'UseParallel',true,'Display','iter','MaxGenerations',5); 
fcn=@(Sol)ObjFunc(Sol,Xn,Y,Indices);
[BestSol,Fval] = gamultiobj(fcn,n,[],[],[],[],[],[],options);

In python

  1. We don't fully understand how to define our problem class properly. We have tried to follow some of the examples, but we don't see very clear how to define the objective function according to our case.

this is the example from pymoo docs https://pymoo.org/problems/index.html#

class SphereWithConstraint(Problem):

    def __init__(self):
        super().__init__(n_var=10, n_obj=1, n_constr=1, xl=0, xu=1)

    def _evaluate(self, x, out, *args, **kwargs):
        out["F"] = np.sum((x - 0.5) ** 2, axis=1)
        out["G"] = 0.1 - out["F"]

in our case would be somethin like this:

import numpy as np
from pymoo.model.problem import Problem


class ProteinClassifier(Problem):

    def __init__(self):
        super().__init__(n_var= *columns of X* , n_obj=2, n_constr=0)

    def _evaluate(self, x, out, *args, **kwargs):
        out["F"] = ??????????
        out["G"] = ??????????

in this example they created a binary single objective optimization https://pymoo.org/customization/binary_problem.html:

import numpy as np
from pymoo.algorithms.so_genetic_algorithm import GA
from pymoo.factory import get_crossover, get_mutation, get_sampling
from pymoo.optimize import minimize
from pymoo.problems.single.knapsack import create_random_knapsack_problem

problem = create_random_knapsack_problem(30)

algorithm = GA(
    pop_size=200,
    sampling=get_sampling("bin_random"),
    crossover=get_crossover("bin_hux"),
    mutation=get_mutation("bin_bitflip"),
    eliminate_duplicates=True)

res = minimize(problem,
               algorithm,
               ('n_gen', 100),
               verbose=False)

print("Best solution found: %s" % res.X.astype(int))
print("Function value: %s" % res.F)
print("Constraint violation: %s" % res.CV)

we want to make somethin similar to this, but for multiobjective:

Best solution found: [1 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0]

Function value: [-686]
Constraint violation: [0.]

Having a look in the github source code we found this, and it is not clear where they defined the x the code is supposing to receive as argument in _evaluate (is supposed to be a decorator in the problem class):

def _evaluate(self, x, out, *args, **kwargs):
    out["F"] = -anp.sum(self.P * x, axis=1)
    out["G"] = (anp.sum(self.W * x, axis=1) - self.C)

the problem source code works like this:

def evaluate(self,
             X,
             *args,
             return_values_of="auto",
             return_as_dictionary=False,
             **kwargs):

    """
    Evaluate the given problem.
    The function values set as defined in the function.
    The constraint values are meant to be positive if infeasible. A higher positive values means "more" infeasible".
    If they are 0 or negative, they will be considered as feasible what ever their value is.

but a decorator is then defined:

@abstractmethod
def _evaluate(self, x, out, *args, **kwargs):
    pass

The method def _evaluate(self, x, out, *args, **kwargs): receives the argument x, but we haven't found where is defined and called. For this reason, we can't execute our objective function, or even put the feature matrix as the input.

-How and where can we define the objective function?

-How does the x is called and handle in the examples?

  1. How to choose the best parameters for the GA object?

We've been stucked in this point for last two weeks. We really appreciate your help

  • we also read this [Answer](https://stackoverflow.com/questions/59444909/pymoo-run-multiobjective-function/62175387#62175387) from the main developer. but still isn't clear to us the _evaluate method paradigm. – Julio Cuadros Apr 13 '21 at 17:55

1 Answers1

0

I have found some implementation in their official page. Just scroll down to the end to find out more about params and return in the page.

import numpy as np
import autograd.numpy as anp

from pymoo.model.problem import Problem

class MyProblem(Problem):

    def __init__(self, const_1=5, const_2=0.1):

        # define lower and upper bounds -  1d array with length equal to number of variable
        xl = -5 * anp.ones(10)
        xu = 5 * anp.ones(10)

        super().__init__(n_var=10, n_obj=1, n_constr=2, xl=xl, xu=xu, evaluation_of="auto")

        # store custom variables needed for evaluation
        self.const_1 = const_1
        self.const_2 = const_2

    def _evaluate(self, x, out, *args, **kwargs):
        f = anp.sum(anp.power(x, 2) - self.const_1 * anp.cos(2 * anp.pi * x), axis=1)
        g1 = (x[:, 0] + x[:, 1]) - self.const_2
        g2 = self.const_2 - (x[:, 2] + x[:, 3])

        out["F"] = f
        out["G"] = anp.column_stack([g1, g2])

Basically, you initiate the instance of MyProblem class and then call evaluate method.

problem = MyProblem()
dict = problem.evaluate(self, x, out, return_as_dictionary=True)

Returns dict if `return_as_dictionary` is set True or you specify a values of list of strings to be exited

a, b, c, d, e = problem.evaluate(x, return_values_of=[a, b, c, d, e])

Allowed values are [“F”, “CV”, “G”, “dF”, “dG”, “dCV”, “feasible”] for the this argument.

I am not really sure what you meant by GA object but I guess it is g1 and g2 objects which is being defined by passing x argument and then implementing abovementioned formula on it.

user20210310
  • 104
  • 6
  • Maybe we weren't clear. We already have optimized the classifier in matlab (is flawlessly working). In the other hand, our main goal is to achieve a right translation of what we have in the codes in matlab that we showed. To clarify, we don't have constrains, so `n_constr=0`. then, we don't need to define xl and xu. The deal is to declare the objective function in this problem as in matlab. Where we have `fcn=@(Sol)ObjFunc(Sol,Xn,Y,Indices);` How are we supposed to define this Obj Fn in the pymoo problem? let me know if i'm a bit more clear – Julio Cuadros Apr 13 '21 at 15:56