8

I am using PyCaret and get an error.

AttributeError: 'SimpleImputer' object has no attribute '_validate_data'

Trying to create a basic instance.

# Create a basic PyCaret instance
import pycaret
from pycaret.regression import *
mlb_pycaret = setup(data = pycaret_df, target = 'pts', train_size = 0.8, numeric_features = ['home', 
'first_time_pitcher'], session_id = 123)

All my variables are numeric (I coerced two of them, which are boolean). My target variable is label and this is by default.

I also installed PyCaret, imported its regression, and re-installed scikit learn, imported SimpleImputer as from sklearn.impute import SimpleImputer

OBP_avg Numeric
SLG_avg Numeric
SB_avg  Numeric
RBI_avg Numeric
R_avg   Numeric
home    Numeric
first_time_pitcher  Numeric
park_ratio_OBP  Numeric
park_ratio_SLG  Numeric
SO_avg_p    Numeric
pts_500_parkadj_p   Numeric
pts_500_parkadj Numeric
SLG_avg_parkadj Numeric
OPS_avg_parkadj Numeric
SLG_avg_parkadj_p   Numeric
OPS_avg_parkadj_p   Numeric
pts_BxP Numeric
SLG_BxP Numeric
OPS_BxP Numeric
whip_SO_BxP Numeric
whip_SO_B   Numeric
whip_SO_B_parkadj   Numeric
order   Numeric
ops x pts_500 order15   Numeric
ops x pts_500 parkadj   Numeric
ops23 x pts_500 Numeric
ops x pts_500 orderadj  Numeric
whip_p  Numeric
whip_SO_p   Numeric
whip_SO_parkadj_p   Numeric
whip_parkadj_p  Numeric
pts Label

My traceback is the following:

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
  • 1
    Why did you import `pycaret` first and what version of `scikit-learn` is being used by the environment? Have you tried skipping the `numeric_features` parameter? What about trying the `numeric_iterative_imputer` and `numeric_imputation` parameters? – Mark Moretto Nov 25 '20 at 21:19
  • @MarkMoretto, I tried to skip `numeric_features`. It does not help. I have the same error, but with a much longer traceback. I intentionally imported `pycaret` and its regression right above the code, because in the past order mattered too. If I skip numeric features, `home` is treated as categorical and that second variable does not have class at all. I am not sure why `pts` is a `label`, because I am not doing classification (maybe it is fine). Parameters you mentioned are in `clustering` for PyCaret, I am not sure it will help here. – Anakin Skywalker Nov 25 '20 at 21:48
  • 1
    @MarkMoretto, I dropped half of variables with high correlation and everything worked perfectly. What happened? No clue. – Anakin Skywalker Nov 26 '20 at 05:44
  • Sweet! I was curious whether it would be a version issue since my installation also didn't have that function. But, I'm glad to hear you got it working! – Mark Moretto Nov 27 '20 at 01:43
  • Likely wrong version of sklearn. It should be 0.23.2 right now and not 0.24 (6/2021). – wordsforthewise Jun 12 '21 at 23:13

4 Answers4

13

The problem here is with the imputation. The default per pycaret documentation is 'simple' but in this case, you need to make that imputation_type='iterative' for it to work.

hrokr
  • 3,276
  • 3
  • 21
  • 39
  • 2
    This works, but it may also be that your version of sklearn is wrong. As of 6/2021 it should be version 0.23.2, not the 0.24 version. Using 0.24 at this time will cause the same error. – wordsforthewise Jun 12 '21 at 23:13
  • I changed like you said, but now I got another problem: `ModuleNotFoundError: No module named 'sklearn.kernel_ridge'` – igorkf Sep 01 '21 at 02:22
1

It's incompatibility of library, install pycaret again with: pip install pycaret pandas shap

Henrique
  • 11
  • 1
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 20 '21 at 20:45
1

Good day all. What helped me is installing pycaret=='2.3.10 ' and scikit-learn='0.23.2' at the same time. These two version are compatible and all works fine. I installed scikit-learn using conda as the older versions are not available through pip, and I installed Pycaret using pip3. I hope this helps all who have struggled to get this working like I did.

KYS
  • 11
  • 2
1

Here is what worked for me on this error:

go to line 568 in your base file here: C:\Users\Eric.conda\envs\AUTOGLUON\lib\site-packages\sklearn\impute_base.py then search for the following line of code:

"if self.strategy == "constant" or self.keep_empty_features:"

Perform the following change, then save the file:

Change this:

    if self.strategy == "constant" or self.keep_empty_features:
        valid_statistics = statistics
        valid_statistics_indexes = None

To this:

    if self.strategy == "constant" or (hasattr(self, 'keep_empty_features') and self.keep_empty_features):
        valid_statistics = statistics
        valid_statistics_indexes = None

Save changes. Then, restart the Python kernel for the notebook, and run the code again. It should now work.... Or at least I hope it does for you

Quantum Prophet
  • 337
  • 2
  • 8