Is pandas_ml broken?

Question

The version info and issue are as given below. I want to know if pandas_ml is broken or am I doing something wrong. Why am I not able to import pandas_ml?

Basic info: Versions of sklearn and pandas_ml and python are given below:

Python                            3.8.2
scikit-learn                      0.23.0
pandas-ml                         0.6.1

Issue:

import pandas_ml as pdml

returns the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-47-79d5f9d2381c> in <module>
----> 1 import pandas_ml as pdml
      2 #from pandas_ml import ModelFrame
      3 #mf = pdml.ModelFrame(df.to_dict())
      4 #mf.head()

d:\program files\python38\lib\site-packages\pandas_ml\__init__.py in <module>
      1 #!/usr/bin/env python
      2 
----> 3 from pandas_ml.core import ModelFrame, ModelSeries       # noqa
      4 from pandas_ml.tools import info                         # noqa
      5 from pandas_ml.version import version as __version__     # noqa

d:\program files\python38\lib\site-packages\pandas_ml\core\__init__.py in <module>
      1 #!/usr/bin/env python
      2 
----> 3 from pandas_ml.core.frame import ModelFrame       # noqa
      4 from pandas_ml.core.series import ModelSeries     # noqa

d:\program files\python38\lib\site-packages\pandas_ml\core\frame.py in <module>
      8 
      9 import pandas_ml.imbaccessors as imbaccessors
---> 10 import pandas_ml.skaccessors as skaccessors
     11 import pandas_ml.smaccessors as smaccessors
     12 import pandas_ml.snsaccessors as snsaccessors

d:\program files\python38\lib\site-packages\pandas_ml\skaccessors\__init__.py in <module>
     13 from pandas_ml.skaccessors.linear_model import LinearModelMethods                 # noqa
     14 from pandas_ml.skaccessors.manifold import ManifoldMethods                        # noqa
---> 15 from pandas_ml.skaccessors.metrics import MetricsMethods                          # noqa
     16 from pandas_ml.skaccessors.model_selection import ModelSelectionMethods           # noqa
     17 from pandas_ml.skaccessors.neighbors import NeighborsMethods                      # noqa

d:\program files\python38\lib\site-packages\pandas_ml\skaccessors\metrics.py in <module>
    254 _true_pred_methods = (_classification_methods + _regression_methods
    255                       + _cluster_methods)
--> 256 _attach_methods(MetricsMethods, _wrap_target_pred_func, _true_pred_methods)
    257 
    258 

d:\program files\python38\lib\site-packages\pandas_ml\core\accessor.py in _attach_methods(cls, wrap_func, methods)
     91 
     92         for method in methods:
---> 93             _f = getattr(module, method)
     94             if hasattr(cls, method):
     95                 raise ValueError("{0} already has '{1}' method".format(cls, method))

AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

score 4 · Answer 1 · answered May 15 '20 at 13:26

It seems it is indeed. Here is the situation:

Although the function jaccard_similarity_score is not shown in the available ones of sklearn.metrics in the documentation, it was still there under the hood (hence available) until v0.22.2 (source code) in addition to the jaccard_score one. But in the source code of the latest v0.23, it has been removed, and only jaccard_score remains.

This would imply that it could still be possible to use pandas-ml by simply downgrading scikit-learn to v.0.22.2. But unfortunately this will not work either, throwing a different error:

!pip install pandas-ml
# Successfully installed enum34-1.1.10 pandas-ml-0.6.1

import sklearn
sklearn.__version__
# '0.22.2.post1'

import pandas_ml as pdml

[...]

AttributeError: module 'sklearn.preprocessing' has no attribute 'Imputer'

I guess it would be possible to find a scikit-learn version that works with it by going back enough (the last commit in their Github repo was in March 2019), but not sure if it is worth the fuss. In any case, they do not even mention scikit-learn (let alone any specific version of it) in their requirements file, which does not seem as sound practice, and the whole project seems rather abandoned.

Thanks for your quick response. Much appreciated. You are right, it seems to be abandoned. The last update was almost 2 years back. I tried downgrading sklearn to version 0.18.0 but that didn't work either and threw a bunch of errors. ```install scikit-learn==0.18.2``` threw ```Building wheel for scikit-learn (setup.py) ... error```. I was wondering is there any other library similar to pandas-ml which can be used? Thanks again for your response. — Aviral Bansal, May 15 '20 at 13:33
@AviralBansal Not sure what you mean by "similar"; had never heard of this package before, so I don't know what its perceived value would be. It would certainly seem that the Python ML ecosystem is "blessed" with lots of stuff and was never critically dependent on this package. You are very welcome to accept the answer. — desertnaut, May 15 '20 at 13:40
@AviralBansal that said, there is [`sklearn-pandas`](https://github.com/scikit-learn-contrib/sklearn-pandas), but it looks abandonded, too. — desertnaut, May 15 '20 at 13:47

score 1 · Accepted Answer · answered May 17 '20 at 08:27

So after some time and effort on this, I got it working and realized that the concept of broken in Python is rather murky. It would depend upon the combination of libraries you are trying to use and their dependencies. The older releases are all available and can be used but sometimes, it can be a hit-and-trial process to find that correct combination of package versions which gets everything working.

The other thing that I learnt from this exercise is the importance of having a significant expertise in creating and managing the virtual environments when programming with python.

In my case, I got help from some friends with the hit-and-trial part and found that pandas_ml works on python 3.7. Given below is the pip freeze output which can be used to setup a reliable virtual environment for machine learning and deep learning work using libraries like pandas_ml and imbalanced-learn libraries and may include some other libraries which have not had a new release in the last few years.

To create a working environment with the right version of packages which would ensure that pandas_ml and imbalanced-learn libraries work, create an environment with the following configuration on Python 3.7.

backcall==0.1.0
colorama==0.4.3
cycler==0.10.0
decorator==4.4.2
enum34==1.1.10
imbalanced-learn==0.4.3
ipykernel==5.2.1
ipython==7.14.0
ipython-genutils==0.2.0
jedi==0.17.0
joblib==0.15.0
jupyter-client==6.1.3
jupyter-core==4.6.3
kiwisolver==1.2.0
matplotlib==3.2.1
numpy==1.15.4
pandas==0.24.2
pandas-ml==0.6.1
parso==0.7.0
pickleshare==0.7.5
prompt-toolkit==3.0.5
Pygments==2.6.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
pywin32==227
pyzmq==19.0.1
scikit-learn==0.20.0
scipy==1.3.3
six==1.14.0
threadpoolctl==2.0.0
tornado==6.0.4
traitlets==4.3.3
wcwidth==0.1.9

Hope this helps someone who is looking for the right combination of library versions to setup their machine and deep learning environment in python using pandas_ml and imbalanced-learn packages.

Please notice that, as it seemed we had already clarified in the (now deleted) comments and chat, this is a great answer to a *different* question. Your question, as all good and valid questions here, was indeed very **specific**: "*Is pandas_ml broken?*" mentioning **specifically** the setting, including `scikit-learn v0.23.0`. This was answered clearly and unambiguously. What you should do here, instead of unaccepting the valid answer to your specific question, is to open a new one "*How can I make pandas_ml work*" (including a link to this one), which you could proceed to self-answer (1/2) — desertnaut, May 17 '20 at 14:09
SO actually [explicitly encourages](https://stackoverflow.com/help/self-answer) this type of self-answering. Instead, by answering a different question *here*, you are acting as if the existing one is invalid and/or unhelpful, which is clearly not the case. I'm afraid that such invalidating of valid answers is not how SO works, and it is completely contrary to its spirit (2/2). — desertnaut, May 17 '20 at 14:17

Is pandas_ml broken?

2 Answers2