I am using H2O and R for a binary classification problem. The dataset has over 800 features and some of them include non-english names and characters, for example 'ö'.
I am getting the following error message:
Error in .verify_dataxy(params$training_frame, x, y): Invalid column names
Then the list of columns with the problematic characters.
I have already googled and searched SO for a documentation about the settings regarding accepted languages in H2O.
Here is a sample code:
library(h2o)
h2o.init()
sodata <- data.frame(Erklärung = sample(c(0,1), 50, replace = TRUE),
isPot = sample(c(0,1), 50, replace = TRUE),
target = sample(c(0,1), 50, replace = TRUE))
#
tar <- "target"
pr <- setdiff(colnames(sodata), tar)
sohex <- as.h2o(sodata)
spl <- h2o.splitFrame(data = sohex, ratios = .7, seed = 1)
training <- spl[[1]]
testing <- spl[[2]]
#
gbm1 <- h2o.gbm(x = pr,
y = tar,
training_frame = training,
validation_frame = testing)
#
#h2o.shutdown()
The error message is
Error in .verify_dataxy(training_frame, x, y):
Invalid column names: Erklärung
Is there a way to change the accepted language in H2O?
Edit: session and environment info,
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64_w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
Under the displayed settings after Sys.getenv()
there is nothing language related.