38

I want to create a new column based on 4 values in another column.

if col1=1 then col2= G;
if col1=2 then col2=H;
if col1=3 then col2=J;
if col1=4 then col2=K.

HOW DO I DO THIS IN R? Please I need someone to help address this. I have tried if/else and ifelse but none seems to be working. Thanks

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
nolyugo
  • 1,451
  • 3
  • 12
  • 12
  • what programing language are you using? – The GiG Oct 05 '11 at 08:03
  • 3
    @TheGiG The OP marked the question with [tag:r] – Andrie Oct 05 '11 at 08:36
  • Highly related: [case statement equivalent](http://stackoverflow.com/q/4622060/168747), [How do add a column in a `data.frame`?](http://stackoverflow.com/q/4562547/168747), [Data cleaning in Excel sheets](http://stackoverflow.com/q/7374314/168747) (in this one another set of links). – Marek Oct 05 '11 at 10:00

4 Answers4

39

You could use nested ifelse:

col2 <- ifelse(col1==1, "G",
        ifelse(col1==2, "H",
        ifelse(col1==3, "J",
        ifelse(col1==4, "K",
                        NA  )))) # all other values map to NA

In this simple case it's overkill, but for more complicated ones...

Marek
  • 49,472
  • 15
  • 99
  • 121
  • 1
    "but for more complicated ones..." -- more complicated ones make nested `ifelse` a **better** idea? That's counterintuitive for me. – Nate Anderson Sep 04 '16 at 16:16
  • @TheRedPea For more complicated conditions, based on different columns, non related to each-other, etc. One line for one condition. – Marek Sep 05 '16 at 20:17
  • Yeah, I guess one may not have a choice but to express logic with if statements. – Nate Anderson Sep 07 '16 at 16:16
25

You have a special case of looking up values where the index are integer numbers 1:4. This means you can use vector indexing to solve your problem in one easy step.

First, create some sample data:

set.seed(1)
dat <- data.frame(col1 = sample(1:4, 10, replace = TRUE))

Next, define the lookup values, and use [ subsetting to find the desired results:

values <- c("G", "H", "J", "K")
dat$col2 <- values[dat$col1]

The results:

dat
   col1 col2
1     2    H
2     2    H
3     3    J
4     4    K
5     1    G
6     4    K
7     4    K
8     3    J
9     3    J
10    1    G

More generally, you can use [ subsetting combined with match to solve this kind of problem:

index <- c(1, 2, 3, 4)
values <- c("G", "H", "J", "K")
dat$col2 <- values[match(dat$col1, index)]
dat
   col1 col2
1     2    H
2     2    H
3     3    J
4     4    K
5     1    G
6     4    K
7     4    K
8     3    J
9     3    J
10    1    G
Andrie
  • 176,377
  • 47
  • 447
  • 496
8

There are a number of ways of doing this, but here's one.

set.seed(357)
mydf <- data.frame(col1 = sample(1:4, 10, replace = TRUE))
mydf$col2 <- rep(NA, nrow(mydf))
mydf[mydf$col1 == 1, ][, "col2"] <- "A"
mydf[mydf$col1 == 2, ][, "col2"] <- "B"
mydf[mydf$col1 == 3, ][, "col2"] <- "C"
mydf[mydf$col1 == 4, ][, "col2"] <- "D"

   col1 col2
1     1    A
2     1    A
3     2    B
4     1    A
5     3    C
6     2    B
7     4    D
8     3    C
9     4    D
10    4    D

Here's one using car's recode.

library(car)
mydf$col3 <- recode(mydf$col1, "1" = 'A', "2" = 'B', "3" = 'C', "4" = 'D')

One more from this question:

mydf$col4 <- c("A", "B", "C", "D")[mydf$col1]
A.Elsy
  • 128
  • 1
  • 7
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
1

You could have a look at ?symnum.

In your case, something like:

col2<-symnum(col1, seq(0.5, 4.5, by=1), symbols=c("G", "H", "J", "K"))

should get you close.

Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57