10

I have trained and saved my H2O AutoML model. after reloading, while I am using predict method, I am getting below error: java.lang.IllegalArgumentException: Test/Validation dataset has a non-categorical column 'response' which is categorical in the training data

I have not specified any encoding while model creation but I am getting this error now. Can anyone help me on this issue.

Any help will be highly appreciated.

ATUL AGARWAL
  • 101
  • 1
  • 3
  • Can you specify which version of H2O you are using? This looks like an old bug that was fixed. – Erin LeDell Aug 16 '19 at 15:27
  • I am using 3.26.0.2 version of H2O. – ATUL AGARWAL Aug 17 '19 at 16:32
  • I am seeing a similar error on 3.26.0.3 java.lang.IllegalArgumentException: Test/Validation dataset has a categorical response column 'C30' with no levels in common with the model. If you remove the y label from the "test" set and pass to model.predict() then it works. Was this behavior changed recently ? I am attempting to go through this diff : https://github.com/h2oai/h2o-3/compare/4854053b2e1773e6df02e04895709f692ebf7088...9d4c43ef5bd420a49af6df5bda3e1f89590d6c52 – dparkar Aug 27 '19 at 21:18
  • Hi Atul Were you able to get an answer on this? I am facing a similar issue. – Vikram Garg Jul 15 '20 at 13:33

2 Answers2

4

This issue related is with new examples data in particular column that doesn't exist in traing set. I use parsing column types to numeric (or string) in this cases.

def _convert_h2oframe_to_numeric(h2o_frame, training_columns):
    for column in training_columns:
        h2o_frame[column] = h2o_frame[column].asnumeric()
    return h2o_frame

Remember to use this function for training and prediction process.

CezarySzulc
  • 1,849
  • 1
  • 14
  • 30
0

Maybe a Little late, but this problem still ocurrs, specially if you have lots of columns, what I dit to solve this problem was:

H2O gives one of two possible messages:

Test/Validation dataset has a non-categorical column '<YOUR-COLUMN>' which is categorical in the training data

or

Test/Validation dataset has categorical column '<YOUR-COLUMN>' which is real-valued in the training data

So, what I did was to extract the column name from the message and convert the column according to the message in categorical or numeric.

so, my python code looks like this:

hf = h2o.H2OFrame(df)
transform = True
while transform:
    try:
        prediction = rf_model.predict(hf)
        transform = False
    except Exception as inst:
        err_msg = str(inst)
        tarr = err_msg.split('categorical')
        column = tarr[1].split("'")[1]
        if tarr[0][-1] == '-': # convert to categorical
            hf[column] = hf[column].asfactor()
            print(f'{column} converted to categorical')
        else: # convert to numeric
            hf[column] = hf[column].asnumeric()
            print(f'{column} converted to real-valued')

Hope it helps!