Why Optuna getting stuck after certain number of trials

Question

I am trying to do hypertuning using Optuna. The dataset is the MovieLense (1M). In one script I have Lasso, Ridge and Knn. Optuna is working fine for the Lasso and Ridge but getting stuck for the Knn.

You can see the trials for the Ridge model tuning was done at 2021-07-22 18:33:53. Later a new study was created for the Knn at 2021-07-22 18:33:53. Now (at the time of posting) it is 2021-07-23 11:07:48 but there was no trial for the Knn.

^[[32m[I 2021-07-22 18:33:53,959]^[[0m Trial 199 finished with value: -1.1917496039282074 and parameters: {'alpha': 3.553292157377711e-07, 'solver': 'sag', 'normalize': False}. Best is trial 71 with value: -1.1917485424789929.^[[0m
^[[32m[I 2021-07-22 18:33:53,961]^[[0m A new study created in memory with name: no-name-208652b3-68ec-4464-a2ae-5afefa9bf133^[[0m

The same thing is happening with the SVR model (you can see optuna stuck after 84 number trial at 2021-07-23 05:13:40)

^[[32m[I 2021-07-23 05:13:37,907]^[[0m Trial 83 finished with value: -1.593471166487258 and parameters: {'C': 834.9834466420455, 'epsilon': 99.19181748590665, 'kernel': 'linear', 'norm': 'minmax'}. Best is trial 61 with value: -1.553044709891868.^[[0m
^[[32m[I 2021-07-23 05:13:40,261]^[[0m Trial 84 finished with value: -1.593471166487258 and parameters: {'C': 431.4022584640214, 'epsilon': 2.581688694428477, 'kernel': 'linear', 'norm': 'minmax'}. Best is trial 61 with value: -1.553044709891868.^[[0m

Could you tell me why Optuna is getting stuck and how can I solve the issues?

Environment

Optuna version: 2.8.0
Python version: 3.8
OS: Linux CentOS 7
(Optional) Other libraries and their versions: Scikit Learn, Pandas, and (most common libraries)

Reproducible examples

The code I am using for hypertuning

def tune(objective):
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=200, n_jobs=40)
    params = study.best_params
    return params

def knn_objective(X_train: DataFrame, y_train: DataFrame, cv_method: kfolds) -> Callable[[Trial], float]:
    def objective(trial: Trial) -> float:
        args: Dict = dict(
            n_neighbors=trial.suggest_int("n_neighbors", 2, 40, 1),
            weights=trial.suggest_categorical("weights", ["uniform", "distance"]),
            metric=trial.suggest_categorical("metric", ["euclidean", "manhattan", "mahalanobis"]),
        )
        estimator = KNeighborsRegressor(**args)
        scores = cross_validate(
            estimator, X=X_train, y=y_train, scoring="neg_mean_squared_error", cv=cv_method, n_jobs=-1
        )
        return float(np.mean(scores["test_score"]))

    return objective

I have got the same issue with XGBoost regressor. No matter what I do, the trials are getting stuck randomly. I had a large dataset, so I thought the issue was that. Then, switched to a fake dataset with only 200 rows. Ideally, the search should finish in a few minutes but it is stuck after the 3rd trial and I am running this on Colab on GPU — Bex T., Aug 04 '21 at 09:50
@BexT. Very pathetic! I made an issue (https://github.com/optuna/optuna/issues/2820), but no luck! — 0Knowledge, Aug 04 '21 at 14:36
Hey, I think the issue is with the dataset and search space size. Turns out, I had some conflicting hyperparameters and too large of a search space. That's why some trials were taking forever. After reducing the search space to more important hyperparmeters, the search is faster. In other words, it doesn't look like "it is stuck" — Bex T., Aug 05 '21 at 05:56

score 0 · Answer 1 · answered Jun 08 '22 at 10:15

I have the same problem, but with M1 SoC on Mac. I had 7-8 trials and optuna was stuck. I found a 'solution' - specify device for calculation. For my task I use Tensorflow, but for M1 Mac I can use only miniforge-version of this and usually it works good with GPU-calculation using Metal. But for optuna I write "with tf.device('/device:CPU:0'):" in:

def create_model(trial):
    with tf.device('/device:CPU:0'):
        ...

def create_optimizer(trial):
    with tf.device('/device:CPU:0'):
        ...

def objective(trial):
    with tf.device('/device:CPU:0'):

if __name__ == "__main__":
    with tf.device('/device:CPU:0'):
        ...

I'm not sure that it is a right solution, but for me, it works (yes, with more time, but it works)

tensorflow-deps           2.9.0
tensorflow-estimator      2.9.0
tensorflow-macos          2.9.1
tensorflow-metal          0.5.0
optuna                    2.10.0

python3.9

Why Optuna getting stuck after certain number of trials

Environment

Reproducible examples

1 Answers1