0

Here's a data frame I'm working with:

c1 = c('a', 'b', 'c', 'd')
c2 = c('d', 'a', 'd', 'c')
c3 = c('a', 'c', 'd', 'b')
c4 = c('a', 'c', 'b', 'd')
df = data.frame(c1, c2, c3, c4)

c1    c2    c3    c4
a     d     a     a
b     a     c     c
c     d     d     b
d     c     b     d

I would like to convert using this scale: a=1, b=2, c=3, d=4. So that I get something like this:

c1 c2 c3 c4
  1  4  1  1
  2  1  3  3
  3  4  4  2
  4  3  2  4

This is what I have come up with:

for(i in colnames(df)){
    df$i = gsub("a", 1, df$i)
    df$i = gsub("b", 2, df$i)
    df$i = gsub("c", 3, df$i)
    df$i = gsub("d", 4, df$i)
 }

But it doesn't work. Should I use gsub here, or is there a simpler way to do this?

jason adams
  • 545
  • 2
  • 15
  • 30
  • 1
    similar to the answer below, if your key wasn't sequential, you could make your own `key <- c('a'='1', 'b'='2', 'c'='3', 'd'='4'); df[] <- key[as.matrix(df)]` – rawr Dec 07 '14 at 04:13

1 Answers1

3

We can do this in a couple of ways. One way is to convert the data.frame to matrix and then match those with unique elements in the dataset. i.e. in this case letters[1:4]. But the result will be a vector. We can convert it to the same dimensions of original dataset by specifying the dim as the dim(df) ie. dim<-(..., dim(df). Also please check here to find out more details about the assignment.

df2 <- df
df2[] <- `dim<-`(match(as.matrix(df), letters[1:4]), dim(df))
df2
#  c1 c2 c3 c4
#1  1  4  1  1
#2  2  1  3  3
#3  3  4  4  2
#4  4  3  2  4

The above code can be split into separate lines:

v1 <- match(as.matrix(df), letters[1:4])
df2[] <- `dim<-`(v1, dim(df))

or

df2[] <- matrix(v1, ncol=ncol(df), row=nrow(df))

Another option is to convert the dataset columns to factor with levels specified as unique values of dataset and then convert it to numeric by as.numeric. This can be done in a loop using lapply

df2[] <-lapply(df, function(x) as.numeric(factor(x, levels=letters[1:4])))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Really nice, and more or less exactly what I had in mind, but doing so in one line feels a little like showing off, don't you think? :) Separating it out and explaining a little bit I'm sure would be greatly appreciated. – Aaron left Stack Overflow Dec 07 '14 at 03:52
  • @akrun, what does `dim<-` do here? is this a built in function for r? – jason adams Dec 07 '14 at 04:05
  • @akrun, never mind, for some reason there are these ` that show up surrounding dim. That's why I was wondering if I have ever seen such a thing. – jason adams Dec 07 '14 at 04:09
  • @jasonadams I had seen others using this way. But, if you are uncomfortable using that, the second one is standard ie. using the `matrix` call. – akrun Dec 07 '14 at 04:10
  • Thanks, @akrun, that's nice. For anyone interested in the `dim<-` sorcery (as it's called in this answer), see this great question/answer: http://stackoverflow.com/q/10449366/210673 – Aaron left Stack Overflow Dec 08 '14 at 03:17
  • @Aaron Thanks for providing the link. I will add this link to the post. – akrun Dec 08 '14 at 03:45