0

I'm searching for a better/faster way than this one to generate labels for a variable :

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"

df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c" 

For simplicity, I just have one variable in this example, of course there are more variables in my dataset but I just want to change a specific one.

beginneR
  • 3,207
  • 5
  • 30
  • 52

4 Answers4

2

You don't need sapply:

df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"

You could also produce the same result with factors:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))

# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"

# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))

# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)

# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")

# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE
Thomas
  • 43,637
  • 12
  • 109
  • 140
2

You might also consider mapvalues or revalue in plyr, particularly if you're dealing with more labels:

df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values
Peyton
  • 7,266
  • 2
  • 29
  • 29
2

Here is another fairly straightforward solution:

names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

It is even more straightforward if you include an NA in your "pick" object.

pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

If you use this second alternative, note that nomatch takes an integer value of the position of what you're matching agains. Here, nomatch maps to "NA" which is in the first position in your "pick" vector. If the "NA" were in the last position, you would enter it as nomatch = 6 instead.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
0

You can also use ifelse function.

with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
 [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
Metrics
  • 15,172
  • 7
  • 54
  • 83