72

of course I could replace specific arguments like this:

    mydata=c("á","é","ó")
    mydata=gsub("á","a",mydata)
    mydata=gsub("é","e",mydata)
    mydata=gsub("ó","o",mydata)
    mydata

but surely there is a easier way to do this all in onle line, right? I dont find the gsub help to be very comprehensive on this.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Joschi
  • 2,941
  • 9
  • 28
  • 36
  • 1
    If you wanted to replace different patterns with the same thing, it should be possible with `lapply`, but as you want to replace different patterns with different strings, I think you will still have to specified these one way or another... – juba Mar 06 '13 at 17:33
  • 2
    You might be able to use `chartr` to do this. – Andrie Mar 06 '13 at 17:41
  • 31
    The `gsubfn` function in the `gsubfn` package is a generalization of `gsub` that can do that in one call: `gsubfn(".", list("á"="a", "é"="e", "ó"="o"), c("á","é","ó"))` – G. Grothendieck Mar 06 '13 at 20:39
  • @G.Grothendieck. Thats great and also working for all type of characters. Very valuable comment. Thank you! – Joschi Mar 07 '13 at 10:16
  • 1
    For people searching for a more general solution to this question, here is a more helpful answer: http://stackoverflow.com/a/7664655/1036500 – Ben Jun 26 '14 at 13:33
  • @G.Grothendieck would you also post this as an answer so that future visitors see it as such? – Sam Firke May 09 '15 at 14:33

11 Answers11

84

Use the character translation function

chartr("áéó", "aeo", mydata)
kith
  • 5,486
  • 1
  • 21
  • 21
  • Thats cool for characters... But does this also work with special characaters e.g. underscores, points, etc... It's not within the question, still would be interesting to know something for this case too... – Joschi Mar 06 '13 at 17:49
  • @Joschi, your question doesn't talk about it. I think you'll have to escape them because they are special characters... – Arun Mar 06 '13 at 18:55
33

An interesting question! I think the simplest option is to devise a special function, something like a "multi" gsub():

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

Which gives me:

> mydata <- c("á","é","ó")
> mgsub(c("á","é","ó"), c("a","e","o"), mydata)
[1] "a" "e" "o"
Theodore Lytras
  • 3,955
  • 1
  • 18
  • 25
28

Maybe this can be usefull:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"
Rcoster
  • 3,170
  • 2
  • 16
  • 35
  • On the most current version of R that I'm using the call `iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")` returns `"'a'e'o'A'E'Oc~a"`. Did the behavior change across R versions, or does this have to do with my default encoding? – aaron Jul 12 '16 at 14:55
  • @Aaron: Don't know if is an encoding problem. I tried here at R 3.3.1 and worked as expected. – Rcoster Jul 13 '16 at 17:45
19

You can use stringi package to replace these characters.

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"
Maciej
  • 3,255
  • 1
  • 28
  • 43
11

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

Murta
  • 2,037
  • 3
  • 25
  • 33
7

Another mgsub implementation using Reduce

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}
Ramnath
  • 54,439
  • 16
  • 125
  • 152
7

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)
Andy McKenzie
  • 446
  • 4
  • 12
3

Not so elegant, but it works and does what you want

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
3

Related to Justin's answer:

> m <- c("á"="a", "é"="e", "ó"="o")
> m[mydata]
  á   é   ó 
"a" "e" "o" 

And you can get rid of the names with names(*) <- NULL if you want.

Dthal
  • 3,216
  • 1
  • 16
  • 10
1

You can use the match function. Here match(x, y) returns the index of y where the element of x is matched. Then you can use the returned indices, to subset another vector (say z) that contains the replacements for the values of x, appropriately matched with y. In your case:

mydata <- c("á","é","ó")
desired <- c('a', 'e', 'o')

desired[match(mydata, mydata)]

In a simpler example, consider the situation below, where I was trying to substitute a for 'alpha', 'b' for 'beta' and so forth.

x <- c('a', 'a', 'b', 'c', 'b', 'c', 'e', 'e', 'd')

y <- c('a', 'b', 'c', 'd', 'e')
z <- c('alpha', 'beta', 'gamma', 'delta', 'epsilon')

z[match(x, y)]
justin1.618
  • 691
  • 5
  • 15
0

You can also combine them with gsub:

mydata <- gsub("á","a", gsub("é","e", gsub("í","i", gsub("ó","o", gsub ("ú", "u", mydata)))))

Maria
  • 169
  • 1
  • 4