generate labels for variables in R

Question

I'm searching for a better/faster way than this one to generate labels for a variable :

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"

df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

For simplicity, I just have one variable in this example, of course there are more variables in my dataset but I just want to change a specific one.

Related, perhaps helpful, but not necessarily duplicated: http://stackoverflow.com/q/10431403 — BenBarnes, Aug 13 '13 at 13:29

Thomas · Accepted Answer · 2013-08-13T12:49:50.410

You don't need sapply:

df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"

You could also produce the same result with factors:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))

# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"

# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))

# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)

# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")

# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE

score 2 · Answer 2 · answered Aug 13 '13 at 12:48

2

You might also consider mapvalues or revalue in plyr, particularly if you're dealing with more labels:

df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values

answered Aug 13 '13 at 12:48

Peyton

7,266
2
29
29

A5C1D2H2I1M1N2O1R2T1 · Answer 3 · 2013-08-13T19:05:13.443

Here is another fairly straightforward solution:

names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

It is even more straightforward if you include an NA in your "pick" object.

pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

If you use this second alternative, note that nomatch takes an integer value of the position of what you're matching agains. Here, nomatch maps to "NA" which is in the first position in your "pick" vector. If the "NA" were in the last position, you would enter it as nomatch = 6 instead.

score 0 · Answer 4 · answered Aug 13 '13 at 13:27

0

You can also use ifelse function.

with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
 [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

answered Aug 13 '13 at 13:27

Metrics

15,172
7
54
83

generate labels for variables in R

4 Answers4