3

I'm currently implementing machine learning using SMOTE from imblearn.over_sampling, and as I'm synthesizing data for it, I see a very noticeable cutoff for when the SMOTE method breaks. When I synthesize data using the following code and run it through SMOTE (courtesy of Jason Brownlee):

 from imblearn.over_sampling import SMOTE
 from sklearn.datasets import make_classification
 X, y = make_classification(n_samples=10000, n_features=15, n_redundant=0,
 n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=1)
 oversample = SMOTE()
 X, y = oversample.fit_resample(X, y)

It works fine. However, when the number of features is 16...

 from imblearn.over_sampling import SMOTE
 from sklearn.datasets import make_classification
 X, y = make_classification(n_samples=10000, n_features=16, n_redundant=0,
 n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=1)
 oversample = SMOTE()
 X, y = oversample.fit_resample(X, y)

SMOTE breaks. Why is this? Does anyone know of a SMOTE method that works for more than 15 parameters? By SMOTE breaking, I mean I get the error below:

Traceback (most recent call last):



 File "\\arete\shared\Los Angeles\Users\Active\bbonifacio\New ADVANCE\untitled1.py", line 13, in <module>
    X, y = oversample.fit_resample(X, y)

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\imblearn\base.py", line 83, in fit_resample
    output = self._fit_resample(X, y)

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 324, in _fit_resample
    nns = self.nn_k_.kneighbors(X_class, return_distance=False)[:, 1:]

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\sklearn\neighbors\_base.py", line 763, in kneighbors
    results = PairwiseDistancesArgKmin.compute(

  File "sklearn\metrics\_pairwise_distances_reduction.pyx", line 691, in sklearn.metrics._pairwise_distances_reduction.PairwiseDistancesArgKmin.compute

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 151, in threadpool_limits
    return threadpoolctl.threadpool_limits(limits=limits, user_api=user_api)

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 171, in __init__
    self._original_info = self._set_threadpool_limits()

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
    modules = _ThreadpoolInfo(prefixes=self._prefixes,

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 340, in __init__
    self._load_modules()

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
    self._find_modules_with_enum_process_module_ex()

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
    self._make_module_from_path(filepath)

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()

  File "C:\Users\bbonifacio\Anaconda3\lib\site-packages\threadpoolctl.py", line 646, in get_version
    config = get_config().split()

AttributeError: 'NoneType' object has no attribute 'split'

And here are the versions of packages:

Sklearn: 1.1.1 Imblearn: 0.9.1 Threadpoolctl: 2.1.0

1 Answers1

3

Big thanks to rickhg12hs for this answer!

The solution is to update threadpoolctl. It was not working on threadpoolctl on my versin of 2.1.0, but it works on the updated version. If anyone else is having this problem, type

pip install -U threadpoolctl

in your command terminal, and it should be fixed. Happy coding!

  • 1
    Years later, I comment this also fix the issue: 'NoneType' object has no attribute 'split' when using SMOTE. – PeCaDe Jul 21 '22 at 13:22