20

I have a data frame with a character column:

df <- data.frame(var1 = c("aabbcdefg", "aabbcdefg"))
df
#        var1
# 1 aabbcdefg
# 2 aabbcdefg

I want to replace several different individual characters, e.g. from "a" to "h", from "b" to "i" and so on. Currently I use several calls to gsub:

df$var1 <- gsub("a", "h", df$var1)
df$var1 <- gsub("b", "i", df$var1)
df$var1 <- gsub("c", "j", df$var1)
df$var1 <- gsub("d", "k", df$var1)
df$var1 <- gsub("e", "l", df$var1)
df$var1 <- gsub("f", "m", df$var1)
df
#        var1
# 1 hhiijklmg
# 2 hhiijklmg

However, I'm sure there are more elegant solutions. Any ideas ho to proceed?

Henrik
  • 65,555
  • 14
  • 143
  • 159
jrara
  • 16,239
  • 33
  • 89
  • 120

3 Answers3

39

You want chartr:

df$var1 <- chartr("abcdef", "hijklm", df$var1)
df
#        var1
# 1 hhiijklmg
# 2 hhiijklmg
Marek
  • 49,472
  • 15
  • 99
  • 121
22

You can create from and to vectors:

from <- c('a','b','c','d','e','f')
to <- c('h','i','j','k','l','m')

and then vectorialize the gsub function:

gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}

Which gives:

> df <- data.frame(var1 = c("aabbcdefg", "aabbcdefg"))
> df$var1 <- gsub2(from, to, df$var1)
> df
       var1
1 hhiijklmg
2 hhiijklmg
Jean-Robert
  • 840
  • 6
  • 10
  • 1
    @jrara How should I modify the code to make replacement conditionally? In the following example, I want to replace Mech, Oper and Eng, only when they are shortened, and I don't want to replace them inside the complete words (i.e. not Mech in Mechanical, or Oper in Operations) Here is the example: `from <- ("Mech", "Oper", "Eng") to <- ("Mechanical", "Operations", "Engineer") data.frame(var1 = c("Mech", "Mechanical Engineer", "Oper", "Operations"))` – vatodorov Aug 14 '13 at 18:48
  • Should be a standard function, Great! – Huub Hoofs Oct 30 '13 at 15:14
11

If you don't want to use chartr because the substitutions may be more than one character, then another option is to use gsubfn from the gsubfn package (I know this is not gsub, but is an expansion on gsub). Here is one example:

> library(gsubfn)
> tmp <- list(a='apple',b='banana',c='cherry')
> gsubfn('.', tmp, 'a.b.c.d')
[1] "apple.banana.cherry.d"

The replacement can also be a function that would take the match and return the replacement value for that match.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
Greg Snow
  • 48,497
  • 6
  • 83
  • 110