4

I have the following code, which is called test_build, and it has a test case to save a scikit-learn model along with x_train, y_train and score data, in a tuple object to a ".pkl" file.

from build import *
import os 
import pandas as pd
import sklearn 
from sklearn import *
import unittest
from sklearn.model_selection import train_test_split
import numpy as np
import tempfile

class TestMachineLearningUtils(unittest.TestCase):

    def test_save_model(self):

        X, y = np.arange(10).reshape((5,2)), range(5) 
        model =  RandomForestClassifier(n_estimators = 300,
                                        oob_score = True, 
                                        n_jobs = -1,
                                        random_state = 123)
        X_train, X_test, y_train, y_test = train_test_split(\
                X, y, test_size=0.33, random_state=42) 
        clf = model.fit(X_train, y_train)
        score = model.score(X_test, y_test)
        dir_path = os.path.dirname(os.path.realpath(__file__))
        f = tempfile.TemporaryDirectory(dir = dir_path)
        pkl_file_name = f.name + "/" + "pickle_model.pkl"
        tuple_objects = (clf, X_train, y_train,  score)
        path_model = save_model(tuple_objects, pkl_file_name)
        exists_model = os.path.exists(path_model)
        self.assertExists(exists_model, True)



if __name__ == "__main__":
    unittest.main()

This is the content of the save_model function found in the build module I imported in my test file.

def save_model(tuple_objects, model_path): 

    pickle.dump(tuple_objects, open(model_path), 'wb')

    return model_path

The problem I am running to, is that I cannot test if the file is created within a temporal directory. It is apparently created, but it is cleaned after it has been created, from the error message I receive.

C:\Users\User\AppData\Local\Continuum\miniconda3\envs\geoenv\lib\tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory>

Does anyone knows a solution to this problem? How could one supress the cleaning up of a temporary directory created using the tempfile module in python?

almrog
  • 141
  • 6

2 Answers2

5

It looks to me like your code does exactly what you want it to do and you just get confused by the warning. The warning is merely telling you that you should explicitly delete the temporary directory, but that the module is so kind as to do it for you.

To delete the temporary directory yourself, either use it in a context manager or call the close method on it.

Bananach
  • 2,016
  • 26
  • 51
  • The entire point of using the managed temporary directory is to get the cleanup on object destruction - so why would it make sense to complain? The "fix" would be to go one level deeper and write a finalizer that explicitly called `cleanup()` on the temp dir, but that again defeats the point of using something that already has a finalizer written for it. – Greg Mar 24 '22 at 10:10
  • @Greg you'd need to use it in a context manager to get the automatic deletion (and you should) – Bananach Mar 24 '22 at 12:29
  • Also, it's not the whole point. The other point is to get a platform independent temporary storage location (including having the `if exists: name+='_1'...` bit taken care of) – Bananach Mar 24 '22 at 12:37
  • Its probably my C++ background showing - in C++ you expect if you have lifetime management (you don't always - but `__del__`, finalizer do provide that for TemporaryDirectory) that it will function cleanly without yelling. I'm feeding a complex callback which sometimes has to unpack files into a temporary directory, other times pass files directly - ie the scope is inside-out - can't use `with`. The "solution" to quiet the warning is to add an extra level of lifetime management and use finally to explicitly clean - its the opposite of what a C++ developer expects from convenience functions. – Greg Mar 30 '22 at 08:24
  • @Greg `__del__` isn't reliable in Python. The temporary does not have lifetime management, it only has a convenience function to warn you (and out of goodness of heart, do your job for you) when it happens to detect that you forgot to do the lifetime management yourself (either using a context manager or by explicitly calling the `close` method). I'm not clear about your situation, but are you sure you can't use a context manager high enough up in the call stack? – Bananach Mar 30 '22 at 13:49
  • Maybe I'm not up to date on `__del__` What I said about it being unreliable definitely used to be the case, see https://stackoverflow.com/questions/24596400/garbage-collector-and-problems-with-the-del-finalizer and https://stackoverflow.com/questions/41851098/how-does-del-interfere-with-garbage-collection and likely inspired the TempDir authors but maybe https://peps.python.org/pep-0442/ means that really nowadays the temp dir *could* just be managed for you? Even then though, isn't it best to clean temp does right when they aren't used anymore rather than when GC gets to it? – Bananach Mar 30 '22 at 14:03
-1

You simply don't. If you want the directory to outlast the scope you create "not a temporary directory".

Or more likely when testing - you create directory in test setup, fill, test, teardown, so each test is independent.

kwesolowski
  • 695
  • 8
  • 18