2

I am creating Accumulated Local Effect plots using Python's PyALE function. I am using a RandomForestRegression function to build the model.

I can create 1D ALE plots. However, I get a Value Error when I try to create a 2D ALE plot using the same model and training data.

Here is my code.

ale(training_data, model=model1, feature=["feature1", "feature2"])

I can plot a 1D ALE plot for feature1 and feature2 with the following code.

ale(training_data, model=model1, feature=["feature1"], feature_type="continuous")

ale(training_data, model=model1, feature=["feature2"], feature_type="continuous")

There are no missing or infinite values for any column in the data frame.

I am getting the following error with the 2D ALE plot command.

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

This is a link to the function https://pypi.org/project/PyALE/#description

I am not sure why I am getting this error. I would appreciate some help on this.

Thank you,

Rohin

DS_UNI
  • 2,600
  • 2
  • 11
  • 22

1 Answers1

1

This issue was addressed in release v1.1.2 of the package PyALE. For those using earlier versions the workaround mentioned in the issue thread in github is to reset the index of the dataset fed to the function ale. For completeness here's a code that reproduces the error and the workaround:

from PyALE import ale
import pandas as pd
import matplotlib.pyplot as plt
import random
from sklearn.ensemble import RandomForestRegressor

# get the raw diamond data (from R's ggplot2)
dat_diamonds = pd.read_csv(
    "https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/diamonds.csv"
)
X = dat_diamonds.loc[:, ~dat_diamonds.columns.str.contains("price")].copy()
y = dat_diamonds.loc[:, "price"].copy()

features = ["carat","depth", "table", "x", "y", "z"]

# fit the model
model = RandomForestRegressor(random_state=1345)
model.fit(X[features], y)

# sample the data
random.seed(1234)
indices = random.sample(range(X.shape[0]), 10000)
sampleData = X.loc[indices, :]

# get the effects.....
# This throws the error
ale_eff = ale(X=sampleData[features], model=model, feature=["z", "table"], grid_size=100)

# This will work, just reset the index with drop=True
ale_eff = ale(X=sampleData[features].reset_index(drop=True), model=model, feature=["z", "table"], grid_size=100)
DS_UNI
  • 2,600
  • 2
  • 11
  • 22