I have a large dataset in csv
format to build a prediction model. Because of its size, I planned to use h2o
package in R to build the model. However, the data, in multiple columns of the data.frame
, contains some Chinese Simplified characters and h2o
is having difficulty receiving the data.
I've tried two different approaches. The first approach involved directly reading from the file using the h2o.importFile()
function to import the data. However, this approach ends up converting the Chinese characters into some messy codes.
The second approach I've tried to first bring the data into R using readr
and base R read_csv
/read.csv
functions. After the data is loaded correctly into R, I tried to convert the data.frame
into h2o
frame using as.h2o
function. Though, the end result of this approach also resulted in a messed up translation.
To illustrate, I've written the following piece of codes as an example:
require(h2o)
dat<-data.frame(x=rep(c("北京","上海"),50),
y=rnorm(mean=10,sd=3,n=100))
h2o.init(nthreads=-1)
h2o.dat<-as.h2o(dat)