Consider this modified classic example:
library(dplyr)
library(tibble)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"France",
"Tokyo Japan Chinese"),
add_numeric = c(1, 1, 0, 1),
doc_id = 1:4,
class = c(1, 1, 1, 0))
> dtrain
# A tibble: 4 x 4
text add_numeric doc_id class
<chr> <dbl> <int> <dbl>
1 Chinese Beijing Chinese 1 1 1
2 Chinese Chinese Shanghai 1 2 1
3 France 0 3 1
4 Tokyo Japan Chinese 1 4 0
Here, I would like to use lasso to predict class
. The variables of interest are text
and add_numeric
.
I know how to use text2vec
or tm
to predict class
using text
only: the packages will transform text
into a sparse document term matrix and feed the model.
However, here, I want to use both a textual variable text
, and add_numeric
. I do not know how to mix the two approaches. Any ideas?
Thanks!