How to ensure the output of _best_programs of SymbolicTransformer of gplearn is different?

Question

I am using the SymbolicTransformer of gplearn to generate some automated features. The issue is, when I inspect the expression of the features via looking at _best_programs after fitting, I find that most of the features have the same expression. I am wondering whether there is a way to ensure that we output different features using SymbolicTransformer after fitting?

score 0 · Answer 1 · answered Jun 02 '20 at 20:34

I don't know if there is a way to explicitly enforce this but you can probably try to enforce more diverse populations each generation in the hopes that this leads to a a collection of more diverse _best_programs. In my opinion a few parameters you could look into are:

p_crossover
p_subtree_mutation
p_hoise_mutation
p_point_mutation
p_point_replace

If you increase the chance of crossover or mutation you will increase your expected diversity but you must not overdue it. There is a balance between a diverse population and an accurate one. The higher the crossover or mutation the more likely that you will take a strong individual candidate and change it into something meaningless.

How to ensure the output of _best_programs of SymbolicTransformer of gplearn is different?

1 Answers1