-1

I am trying to transform my each prediction into an N Column Vector. i.e Say My Prediction set is a factor of 3 levels and I would like to write each prediction as vector of 3.

My Current Output is

Id Prediction
1  Prediction 1 
2  prediction 2 
3  prediction 3

and what I am trying to achieve

Id  Prediction1 Prediction2 Predication3
1    0               0               1
2    1               0               0  

What is a simpler way of achieving this in R?

Ali Ahmad
  • 1,055
  • 5
  • 27
  • 47
  • That data doesn't match your expected output. See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – David Arenburg Aug 30 '15 at 10:28
  • 1
    Also, my guess is that you are looking for this http://stackoverflow.com/questions/5890584/reshape-data-from-long-to-wide-format-r – David Arenburg Aug 30 '15 at 10:52

4 Answers4

1

It looks like you want to perform so-called "one hot encoding" of your Prediction factor variable by introducing dummy variables. One way to do so is using the caret package.

Suppose you have a data frame like this:

> df <- data.frame(Id = c(1, 2, 3, 4), Prediction = c("Prediction 3", "Prediction 1", "Prediction 2", "Prediction 3"))
> df
  Id   Prediction
1  1 Prediction 3
2  2 Prediction 1
3  3 Prediction 2
4  4 Prediction 3

First make sure you have the caret package installed and loaded.

> install.packages('caret')
> library(caret) 

You can then use caret's dummyVars() function to create dummy variables.

> dummies <- dummyVars( ~ Prediction, data = df, levelsOnly = TRUE)

The first argument to dummyVars(), a formula, tells it to generate dummy variables for the Prediction factor in the date frame df. (levelsOnly = TRUE strips the variable name from the columns names, leaving just the level, which looks nicer in this case.)

The dummy variables can then be passed to the predict() function to generate a matrix with the one hot encoded factors.

> encoded <- predict(dummies, df)
> encoded
  Prediction 1 Prediction 2 Prediction 3
1            0            0            1
2            1            0            0
3            0            1            0
4            0            0            1

You can then, for example, create a new data frame with the encoded variables instead of the original factor variable:

> data.frame(Id = df$Id, encoded)
  Id Prediction.1 Prediction.2 Prediction.3
1  1            0            0            1
2  2            1            0            0
3  3            0            1            0
4  4            0            0            1

This technique generalises easily to a mixture of numerical and categorical variables. Here's a more general example:

> df <- data.frame(Id = c(1,2,3,4), Var1 = c(3.4, 2.1, 6.0, 4.7), Var2 = c("B", "A", "B", "A"), Var3 = c("Rainy", "Sunny", "Sunny", "Cloudy"))
> dummies <- dummyVars(Id ~ ., data = df)
> encoded <- predict(dummies, df)
> encoded
  Var1 Var2.A Var2.B Var3.Cloudy Var3.Rainy Var3.Sunny
1  3.4      0      1           0          1          0
2  2.1      1      0           0          0          1
3  6.0      0      1           0          0          1
4  4.7      1      0           1          0          0

All numerical variables remain unchanged, whereas all categorical variables get encoded. A typical situation where this is useful is to prepare data for a machine learning algorithm that only accepts numerical variables, not categorical variables.

WhiteViking
  • 3,146
  • 1
  • 19
  • 22
0

You can use something like:

as.numeric(data[1,][2:4])

Where '1' is the row number that you are converting to a vector.

rhozzy
  • 332
  • 1
  • 9
0

Taking WhiteViking's start and using table function seems to work.

> df <- data.frame(Id = c(1, 2, 3, 4), Prediction = c("Prediction 3",    "Prediction 1", "Prediction 2", "Prediction 3"))
> df
  Id   Prediction
1  1 Prediction 3
2  2 Prediction 1
3  3 Prediction 2
4  4 Prediction 3
> table(df$Id, df$Prediction)

    Prediction 1 Prediction 2 Prediction 3
1            0            0            1
2            1            0            0
3            0            1            0
4            0            0            1
zacdav
  • 4,603
  • 2
  • 16
  • 37
0

I would use the reshape function

Marc
  • 13
  • 3