3

I am trying to use CatBoost Classifier. Using it I do perform a grid search using randomised_search() method. Unfortunately, the method prints to stdout iteration results for each tree built for each model tried.

There is a parameter supposed to control this: verbose. Ideally verbose could be set to False to inhibit all stdout prints, or set to an integer, specifying an interval between models which are reported (models, no trees).

Do you know how to control this? I get millions of lines in log files...

This question is somehow related to How to suppress CatBoost iteration results?, but that one related to the fit() method, which has a logging_level, silent parameters as well. Another method, the cv() cross validation, responds to logging_level='Silent' cutting out all output.

1 Answers1

4

Setting both logging_level='Silent' when instantiating the model and verbose=False when running the random search should suppress all outputs.

import catboost
from sklearn.datasets import make_classification
from scipy import stats

# generate some data
X, y = make_classification(n_features=10)

# instantiate the model with logging_level='Silent'
model = catboost.CatBoostClassifier(iterations=1000, logging_level='Silent')

pool = catboost.Pool(X, y)

parameters = {
    'learning_rate': stats.uniform(0.01, 0.1),
    'depth': stats.binom(n=10, p=0.2)
}

# run random search with verbose=False
randomized_search_results = model.randomized_search(
    parameters,
    pool,
    n_iter=10,
    shuffle=False,
    plot=False,
    verbose=False,
)
Flavia Giammarino
  • 7,987
  • 11
  • 30
  • 40
  • 1
    THNX a lot, it works! Even better, I have set ```logging_level='Silent'``` in the constructor, and then ```verbose=10``` to have outputs every 10th model. And apart from limiting the log files, it most probably saves time. – Igor T. Podolak May 22 '21 at 15:59
  • It is not clear how combinations of ```logging_level``` and ```verbose``` really work. E.g.: ```logging_level='Silent'``` and ```verbose=False``` cut all output; OK. ```logging_level='Verbose'``` gives info about every subtree in every model, ie. I get learn/test values for N models tried. But ```logging_level='Silent'``` and ```verbose=1```, random search for N models gives N consecutive lines with learn/test values which are **not** from different models, since the test values are increasing, but the time grows in jumps. Are other ```N-1``` models checked? It is not clear :-( – Igor T. Podolak May 23 '21 at 09:08
  • From my side I see the `best` loss decreasing across the iterations, as expected. The `loss` itself does not necessarily decrease, but I think that this is expected in a random search. The training time can also vary depending on the combination of parameters sampled at a given iteration (e.g. higher depth can lead to longer training time). – Flavia Giammarino May 23 '21 at 15:14
  • This was my misunderstanding of the output, because the lines returned with ```logging_level='Verbose'``` return __single__ trees outputs, and that with ```logging_level='Silent'``` but ```verbose=1``` returns model outputs BUT they are identical: number, current output, best model, time run, time left. The ```best``` value grows. The only difference is that the remaining time sometimes grows when the last checked model was intensive. This is the problem of identical outputs and the documentation not clear enough. THNX! – Igor T. Podolak May 24 '21 at 15:12