3

In R I have a data.frame and I'd like to do a bulk update.

My table looks like

Col1  Col2  Col3
A      123   456
A      789   012
B      345   678
B      789   012

I want to scan over the table and replace A with "Apple" and B with "Banana"

In my case, the list of replacements is quite long (~30 items) so I have them both in lists like:

old<-c('A','B')
new<-c('Apple','Banana')
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Exie
  • 466
  • 5
  • 16

3 Answers3

4

I like working with named vectors:

   df <- data.frame(Col1=c('A','A','B','B'), 
                    Col2=c(123,789,345,789), 
                    Col3=c(456,012,678,012))
oldv <- c('A','B')
newv <- c('Apple','Banana')
names(newv) <- oldv
df$Col1 <- newv[ df$Col1 ]

yields

  > df
    Col1 Col2 Col3
1  Apple  123  456
2  Apple  789   12
3 Banana  345  678
4 Banana  789   12
Mike Wise
  • 22,131
  • 8
  • 81
  • 104
  • 2
    Yes, made it to 1800 :) – Mike Wise Jul 07 '15 at 22:49
  • I _knew_ I was forgetting something like that. It's even generalizable to multidimensional situations. – IRTFM Jul 07 '15 at 22:49
  • I have used this on like 4 questions in the last few weeks after I picked it up last month in the first chapter of Hadley's Advanced R book... I guess it is not that well known. – Mike Wise Jul 07 '15 at 22:51
  • 2
    I'm quite sure I have used it on SO before, and I know that I learned it on rhelp – IRTFM Jul 07 '15 at 22:51
  • Beware that `df$Col1` is a factor in this example - you need `newv[ as.character(df$Col1) ]` to be safe. E.g., try it with `levels(df$Col1) <- c("B","A")` and see how `newv[df$Col1]` and `newv[as.character(df$Col1)]` give differing results – thelatemail Jul 08 '15 at 00:37
2

You could also use lookup() from qdapTools:

ref <- data.frame(old = c("A", "B"),
                  new = c("Apple", "Banana"))

library(qdapTools)
df$Col1 <- lookup(df, ref)

Alternatively, you could use the %l% operator if you prefer the syntax:

df$Col1 <- df %l% ref

Or using base R:

df$Col1 <- ref$new[match(df$Col1, ref$old)]

Which gives:

#    Col1 Col2 Col3
#1  Apple  123  456
#2  Apple  789   12
#3 Banana  345  678
#4 Banana  789   12
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
1

This feels a bit hamfisted since I try to avoid for-loops on aesthetic grounds:

dat$Col1 <- as.character(dat$Col1)
for ( i in seq_along(old) ) {dat$Col1 [ dat$Col1 == old[i] ] <- new[i] }

> dat
    Col1 Col2 Col3
1  Apple  123  456
2  Apple  789   12
3 Banana  345  678
4 Banana  789   12
IRTFM
  • 258,963
  • 21
  • 364
  • 487