0

I posted previously with a similar problem (Categorical Data with tpot). Thanks to Randy, I was able to get the code running, but now that I am stopping it hours later, I am getting a similar error:

  File "XXXXXXXX", line 832, in score
    if np.any(np.isnan(testing_features)):

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I'm not sure if I'm stopping it incorrectly (I just hit ctrl+c in spyder) or there is some other issue. I made sure the data is all numerical including feature titles. Any idea what may be the problem?

Here is the code I'm running:

train_x, test_x, train_y, test_y=train_test_split(x,y)
train_x=pd.get_dummies(train_x).values
from tpot import TPOTRegressor

regressor=TPOTRegressor()
regressor.fit(train_x,train_y)
print(regressor.score(test_x,test_y))

I don't know how to show the contents of train and test arrays. train_x is a size (2400,62) float64 and train_y is a (2400,) size series.

Developer Guy
  • 2,318
  • 6
  • 19
  • 37
Deborah Paul
  • 81
  • 1
  • 9
  • what the type of `testing_features` ? – TwistedSim Apr 17 '18 at 19:00
  • How can I find the type of the variable? It's within the tpot code. I'm sorry I'm new to python. – Deborah Paul Apr 17 '18 at 19:00
  • Then, can you show the code you use to call the function that cause an error? – TwistedSim Apr 17 '18 at 19:12
  • Yes, I'm sorry. I completely forgot that part. I'll edit the post above to include my code. – Deborah Paul Apr 17 '18 at 19:31
  • Which line cause the error? (EDIT: I just saw it's the score call). Try `print(type(test_x), type(test_y))`. – TwistedSim Apr 17 '18 at 19:38
  • The error is from the line: print(regressor.score(test_x,test_y)). Do you know if ctrl c is the correct way to stop the code early? – Deborah Paul Apr 17 '18 at 19:41
  • ctrl+c should stop the code execution. Try to reset your workspace if you have one. – TwistedSim Apr 17 '18 at 19:43
  • test_x is a dataframe. and test_y is a series. That might be my problem. I just changed test_x to float64 with the one hot encoder. I'll try to run it again. – Deborah Paul Apr 17 '18 at 19:44
  • Just to clarify, I should reset my workspace to stop the tpot optimization and read current results? I can reset my variables, but wouldn't that not stop the code execution? – Deborah Paul Apr 17 '18 at 19:53
  • @DeborahPaul, make sure you're running the testing features through the same preprocessing steps as the training features. Those preprocessing steps are at https://stackoverflow.com/a/49823248/1383444 – Randy Olson Apr 17 '18 at 20:07

2 Answers2

1

Using the first solution gave me the following error,

x_train = x_train.astype(np.float64)
x_test = x_test.astype(np.float64)
ValueError: setting an array element with a sequence.

Converting the features to numpy array does the trick for me. Even though they were initially numpy arrays

x_train = np.array(list(x_train), dtype=np.float)
x_test = np.array(list(x_test), dtype=np.float)

Alex L
  • 4,168
  • 1
  • 9
  • 24
Adnan Taufique
  • 379
  • 1
  • 9
0

For some reason TPOT returns this error related to isnan when the error is the type. Ensure your features are converted to floats:

X = X.astype(np.float64)
Michael Davidson
  • 1,391
  • 1
  • 14
  • 31