Printing all output from sklearn GridSearchCV to file?

Question

I am running a long grid-search using sklearn and I want to log all (emphasis all) console output to file. Running from terminal using > and changing stdout to an open file etc. all work ... but only partially which is the accepted answer here. Anything called by print does get saved to file, but not everything shown on console is saved. In particular for:

Fitting 5 folds for each of 128 candidates, totalling 640 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    2.7s
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed:   12.3s
[Parallel(n_jobs=4)]: Done 442 tasks      | elapsed:   35.1s
[Parallel(n_jobs=4)]: Done 640 out of 640 | elapsed:   55.7s finished

the first line does get saved to file. But the logging from [Parallel(n_jobs=4)] is not saved. Instead:

Fitting 5 folds for each of 128 candidates, totalling 640 fits
{'estimator__max_depth': 5, 'estimator__min_samples_leaf': 4, 'estimator__min_samples_split': 8}
...
...

The second line is me simply printing best parameters obtained; everything from [Parallel(n_jobs=4)] is lost. Does anyone know how to make this get saved to file also?

You can try [re-directing the stdout from python to a file](https://stackoverflow.com/q/4675728/3374996). — Vivek Kumar, Jun 11 '18 at 06:25
Only the lines containing `[Parallel(n_jobs=4)...]` will be skipped if you followed that answer. That can be overcome by also redirecting the `sys.stderr` to the same file. — Vivek Kumar, Jun 12 '18 at 10:45

ITA · Accepted Answer · 2020-10-28T20:49:17.930

From source of the joblib package used internally by sklearn for parallelization:

def _print(self, msg, msg_args):
    """Display the message on stout or stderr depending on verbosity"""
    # XXX: Not using the logger framework: need to
    # learn to use logger better.
    if not self.verbose:
        return
    if self.verbose < 50:
        writer = sys.stderr.write
    else:
        writer = sys.stdout.write
    msg = msg % msg_args
    writer('[%s]: %s\n' % (self, msg))

So with verbose=1 as the OP was using, redirecting stderr ought to capture the missing lines. But then this will not get stdout. So one can just merge them using this answer and doing:

# necessary imports

logfile = open('test.txt', 'w')

original_stderr = sys.stderr
original_stdout = sys.stdout

sys.stdout = Tee(sys.stdout, logfile)
sys.stderr = sys.stdout
.
.
[code to log]
.
.

sys.stdout = original_stdout
sys.stderr = original_stderr
logfile.close()

Printing all output from sklearn GridSearchCV to file?

1 Answers1

Linked