3

I have a Portugese bank data set that I got from the UCI Machine Learning Repository that is organized like so:

> head(bank_data)
             age       job marital   education default housing loan   contact month day_of_week       duration      campaign        pdays
1  1.53301567694 housemaid married    basic.4y      no      no   no telephone   may         mon  0.01047129616 -0.5659151042 0.1954115279
2  1.62897345569  services married high.school unknown      no   no telephone   may         mon -0.42149539806 -0.5659151042 0.1954115279
3 -0.29018211937  services married high.school      no     yes   no telephone   may         mon -0.12451829578 -0.5659151042 0.1954115279
4 -0.00230878311    admin. married    basic.6y      no      no   no telephone   may         mon -0.41378170709 -0.5659151042 0.1954115279
5  1.53301567694  services married high.school      no      no  yes telephone   may         mon  0.18788618843 -0.5659151042 0.1954115279
6  0.47748011065  services married    basic.9y unknown      no   no telephone   may         mon -0.23250996934 -0.5659151042 0.1954115279
       previous    poutcome emp.var.rate cons.price.idx cons.conf.idx    euribor3m  nr.employed targetVar
1 -0.3494900415 nonexistent 0.6480843991    0.722713697  0.8864358006 0.7124512301 0.3316758805        no
2 -0.3494900415 nonexistent 0.6480843991    0.722713697  0.8864358006 0.7124512301 0.3316758805        no
3 -0.3494900415 nonexistent 0.6480843991    0.722713697  0.8864358006 0.7124512301 0.3316758805        no
4 -0.3494900415 nonexistent 0.6480843991    0.722713697  0.8864358006 0.7124512301 0.3316758805        no
5 -0.3494900415 nonexistent 0.6480843991    0.722713697  0.8864358006 0.7124512301 0.3316758805        no
6 -0.3494900415 nonexistent 0.6480843991    0.722713697  0.8864358006 0.7124512301 0.3316758805        no

I am trying to use this data to create a neural network using either the nnet package or neuralnet (whichever is easier or ends up working). It seems like before I can create the network, I must first transform all of the categorical variables into binary decisions.

Is there a way that I can "one-hot" encode all of these columns all at once?

I tried to use the mltools package:

data <- one_hot(bank_data)

but this gives the following error:

Error in [.data.frame(dt, , cols, with = FALSE) : unused argument (with = FALSE)

zsad512
  • 861
  • 3
  • 15
  • 41
  • Other possible duplicates: [How can I one-hot encode multiple variables with big data in R](https://stackoverflow.com/q/43578647/903061) or [How to one-hot encode factor variables with data.table?](https://stackoverflow.com/q/39905820/903061). I'm closing this, if the solutions at the duplicates *don't* work for you, please post some code showing what you tried and explaining how the result differs from expectation. – Gregor Thomas Oct 26 '17 at 20:27
  • 1
    Try `model.matrix(targetVar ~ . + 0, data = bank_data)[, -1]` – Gregor Thomas Oct 26 '17 at 20:42
  • Oops, ignore the `[, -1]` in my last comment. Because of the `+ 0` in the formula it is not needed. – Gregor Thomas Oct 27 '17 at 12:49
  • @Gregor, thank you- this works. The only problem I have is that I need to keep the "one-hot" encoding of the targetVar also...currently this is getting dropped in the `matrix`. – zsad512 Oct 27 '17 at 13:37
  • If you need them in the same matrix, then don't put `targetVar` on the left side of the formula: `model.matrix(~ . + 0, data = bank_data)`. But that doesn't sound right, I've never heard of 1-hot encoding a target variable... – Gregor Thomas Oct 27 '17 at 13:43
  • 1
    Btw, you mention you might use `nnet`. If you use the formula interface for `nnet` then it will take care of the encoding for you - you can just do `nnet(targetVar ~ ., data = bank_data, )`. – Gregor Thomas Oct 27 '17 at 13:45
  • @Gregor what do you mean nnet will do it for me? That is what I have been trying to do! Please explain further... – zsad512 Oct 27 '17 at 13:47
  • If you give `nnet` a formula, you don't need to one-hot encode. It does the `model.matrix` for you. If you give `nnet` a matrix or a data frame then it assumes you have done it yourself. In the help, `?nnet`, the *Details*section begins "*If the response in `formula` is a factor, an appropriate classification network is constructed...*". If you are having trouble with that, I ask a separate question. Maybe also read through [this one that I answered a couple days ago](https://stackoverflow.com/q/46933775/903061). – Gregor Thomas Oct 27 '17 at 14:01
  • Can you make a sample of your data reproducible with `dput(head(your_data))` so I can address that cryptic error you're getting with mltools? – Ben Nov 08 '17 at 23:26

0 Answers0