Y_train values for symbolicRegressor

Question

I split my dataset in X_train, Y_train, X_test and Y_test, and then I used the symbolicRegressor...

I've already convert the string values from Dataframe in float values. But by applying the symbolicRegressor I get this error:

ValueError: could not convert string to float: 'd'

Where 'd' is a value from Y.

Since all my values in Y_train and Y_test are alphabetic character because they are the "labels", I can not understand why the symbolicRegressor tries to get a float number ..

Any idea?

score 0 · Accepted Answer · answered Sep 18 '18 at 14:57

According to the https://gplearn.readthedocs.io/en/stable/index.html - "Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship". Pay attention to mathematical. I am not good at the topic of the question and gplearn's description does not clearly define area of applicability / restrictions.

However, according to the source code https://gplearn.readthedocs.io/en/stable/_modules/gplearn/genetic.html method fit() of BaseSymbolic class contains line X, y = check_X_y(X, y, y_numeric=True) where check_X_y() is sklearn.utils.validation.check_X_y(). Argument y_numeris means: "Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms".

So y values must be numeric.

Thanks for your answer! It seems that I don't check the meaning of X_train and Y_train... since I thought that: X_train is the training data set and Y_train is the "set of labels to all the data in X_train", that's why my Y_train consist of names or (in this case) alphabetic characters.... — Plop, Sep 19 '18 at 07:13

score 0 · Answer 2 · answered Apr 25 '19 at 06:13

Sorry for the late replay. gplearn supports regression (numeric y) with the SymbolicRegressor estimator, and with the newly released gplearn 0.4.0 we also support binary classification (two labels in y) using the SymbolicClassifier. From the sounds of things though, you have a multi-label problem which gplearn does not currently support. It may be something we look to support in the future.

Y_train values for symbolicRegressor

2 Answers2