Use logging along with auto-sklearn

Question

I can't get my head over the way to use logging in conjonction with auto-sklearn.

The example from the doc about logging with auto-sklearn is here. What I'm trying to achieve is:

a main script with a main logger,
functions runing auto-sklearn models along with separated logs.

I've made multiple attempts; one solution I got was to configure the root logger first (using basicConfig), then running an auto-sklearn model with (root) logger configuration, and finally updating the root logger (using basicConfig(force=True)). This doesn't seem very pythonic to me but it works.

The pythonic way would have been to use two named loggers (I think). To my knowledge however, auto-sklearn can't configure logging with anything but a config dictionary. As you can't pass an existing logger as an argument, you have to stick with some inner mechanism triggered by specific logger names (names being present in the default yaml file but undocumented AFAIK).

My current code is the following:

import logging
import pandas as pd
import numpy as np
from autosklearn.regression import AutoSklearnRegressor

#Basic logging config

file_handler = logging.FileHandler("main.log", mode="a", encoding="utf8")
console_handler = logging.StreamHandler()

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(file_handler)
logger.addHandler(console_handler)

    
#Construct dummy dataframe for a short regression
df = pd.DataFrame(
    dict(x1=range(100), x2=range(50, 150), noise=np.random.normal(size=100))
    )
df['y'] = np.square(df.x1+df.noise) + df.x2


#Message is stored to main log and console
logger.info("Starting modelisation")

#Modelisation configuration with logger
logging_config = {
    "version":1,
    "disable_existing_loggers": False,
    "handlers":{
        "spec_logger":{
            'level':'INFO',
            "class":"logging.FileHandler",
            'filename':"dummy_autosklearn.log",
            },
        },
    'loggers': {
        "":{"handlers":["spec_logger"]}, # <- I'd say this is what is wrong here
        },
    }
model = AutoSklearnRegressor(
    memory_limit=None,
    time_left_for_this_task=30,
    logging_config=logging_config,
    )
model.fit(df[['x1', 'x2']], df['y'])


#Message is stored in both logs as well as in the console
logger.info("Finished !")

Running it you will get a main.log with two statements, which will also be displayed in the console.

But as auto-sklearn is running with a root logger config, the "Finished" statement will also be present in the dummy_autosklearn.log.

How could I configure auto-sklearn in an easy way ? (I mean, I'm only hopping to redirect the verbose content displayed by auto-sklearn, in case I need it in the future...).

There is an example [here](https://github.com/franchuterivera/auto-sklearn/blob/0711bb6dc21635795606352755d1158f11d84f4f/examples/40_advanced/example_logging.py) of the way you are supposed to handle logging with autosklearn. I don't think it is sufficient still, as multiple loggers are created by the library and redirected to the root logger. — tgrandje, Nov 16 '21 at 13:11

Vinay Sajip · Answer 1 · 2021-09-30T19:03:11.023

Since autosklearn doesn't seem to allow you to not configure any logging at all (it's bad practice for a library to configure logging - but if you pass None, it'll use an internal YAML file to configure logging) so I suggest you pass in a configuration that encompasses everything - something like this - and don't do any configuration other than through autosklearn itself. You can use the configuration from this example script:

import logging
import logging.config

logging_config = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'simple': {
            'format': '%(levelname)-8s %(name)-15s %(message)s'
        }
    },
    'handlers':{
        'console_handler': {
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
        'file_handler': {
            'class':'logging.FileHandler',
            'mode': 'a',
            'encoding': 'utf-8',
            'filename':'main.log',
            'formatter': 'simple'
        },
        'spec_handler':{
            'class':'logging.FileHandler',
            'filename':'dummy_autosklearn.log',
            'formatter': 'simple'
        },
    },
    'loggers': {
        '': {
            'level': 'INFO',
            'handlers':['file_handler', 'console_handler']
        },
        'autosklearn': {
            'level': 'INFO',
            'propagate': False,
            'handlers': ['spec_handler']
        },
        'smac': {
            'level': 'INFO',
            'propagate': False,
            'handlers': ['spec_handler']
        },
        'EnsembleBuilder': {
            'level': 'INFO',
            'propagate': False,
            'handlers': ['spec_handler']
        },
    },
}

logging.config.dictConfig(logging_config)
for name in (__name__, 'autosklearn', 'smac', 'EnsembleBuilder'):
    logger = logging.getLogger(name)
    logger.debug('DEBUG')
    logger.info('INFO')
    logger.warning('WARNING')
    logger.error('ERROR')
    logger.critical('CRITICAL')

Note that it's configuring your main module as well as autosklearn, but you don't do any other logging configuration than pass this config through. You should get similar results to when you run the above:

The console should show

INFO     __main__        INFO
WARNING  __main__        WARNING
ERROR    __main__        ERROR
CRITICAL __main__        CRITICAL

main.log should have the same:

INFO     __main__        INFO
WARNING  __main__        WARNING
ERROR    __main__        ERROR
CRITICAL __main__        CRITICAL

The other log file should have

INFO     autosklearn     INFO
WARNING  autosklearn     WARNING
ERROR    autosklearn     ERROR
CRITICAL autosklearn     CRITICAL
INFO     smac            INFO
WARNING  smac            WARNING
ERROR    smac            ERROR
CRITICAL smac            CRITICAL
INFO     EnsembleBuilder INFO
WARNING  EnsembleBuilder WARNING
ERROR    EnsembleBuilder ERROR
CRITICAL EnsembleBuilder CRITICAL

Salient points:

Make sure levels are set on loggers, as those are checked first. Levels can be set on handlers, but they are checked only if the logger passes the message to them
Set propagate to False to ensure autosklearn messages don't get to console or main.log

Sorry for being this long to test this. I tried your suggestion but it doesn't seem to work as planned. Autosklearn seems to create more loggers than what is stated inside the yaml file (including loggers with random parts in their name) which ends up in the root logger. I can't find a "clean" way to remove those other than configuring/restoring the root logger before/after the use of autosklearn. — tgrandje, Nov 16 '21 at 13:09

Use logging along with auto-sklearn

1 Answers1