42

I have a string variable containing alphabet[a-z], space[ ], and apostrophe['],eg. x <- "a'b c" I want to replace apostrophe['] with blank[], and replace space[ ] with underscore[_].

x <- gsub("'", "", x)
x <- gsub(" ", "_", x)

It works absolutely, but when I have a lot of conditions, the code becomes ugly. Therefore, I want to use chartr(), but chartr() can't deal with blank, eg.

x <- chartr("' ", "_", x) 
#Error in chartr("' ", "_", "a'b c") : 'old' is longer than 'new'

Is there any way to solve this problem? thanks!

Maël
  • 45,206
  • 3
  • 29
  • 67
Eric Chang
  • 2,580
  • 4
  • 19
  • 19
  • 1
    You've already solved it with the two gsubs. If it looks too ugly, you can create your own wrapper function that is "prettier" I suppose. But you can't use `chartr` because "blank" isn't a character, it's the lack of a character. – MrFlick Nov 27 '15 at 04:07

10 Answers10

44

You can use gsubfn

library(gsubfn)
gsubfn(".", list("'" = "", " " = "_"), x)
# [1] "ab_c"

Similarly, we can also use mgsub which allows multiple replacement with multiple pattern to search

mgsub::mgsub(x, c("'", " "), c("", "_"))
#[1] "ab_c"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • How can I make it so I can call the old text inside the replacement text (for example, the ```do |word|``` action for regular gsub)? – Guest2819 Jan 10 '21 at 23:53
  • @Guest2819 I don't think I understand. Better would be to create a new question showing example and expected output. – Ronak Shah Jan 11 '21 at 01:34
36

I am a fan of the syntax that the %<>% and %>% opperators from the magrittr package provide.

library(magrittr)

x <- "a'b c"

x %<>%
  gsub("'", "", .) %>%
  gsub(" ", "_", .) 
x
##[1] "ab_c"

gusbfn is wonderful, but I like the chaining %>% allows.

Peter
  • 7,460
  • 2
  • 47
  • 68
  • 3
    what is the meaning of third argument "." in gsub() – Ali Jul 21 '17 at 00:21
  • 6
    By default, the object on the left-hand-side of `%>%` is *piped* to the first argument on the right-hand-side. If the LHS needs to be, as in this example, the third argument, then the `.` is the placeholder. See `vignette("magrittr")` for more details. – Peter Jul 21 '17 at 05:48
26

I'd go with the quite fast function stri_replace_all_fixed from library(stringi):

library(stringi)    
stri_replace_all_fixed("a'b c", pattern = c("'", " "), replacement = c("", "_"), vectorize_all = FALSE)

Here is a benchmark taking into account most of the other suggested solutions:

library(stringi)
library(microbenchmark)
library(gsubfn)
library(mgsub)
library(magrittr)
library(dplyr)

x_gsubfn <-
x_mgsub <-
x_nested_gsub <-
x_magrittr <-
x_stringi <- "a'b c"

microbenchmark("gsubfn" = { gsubfn(".", list("'" = "", " " = "_"), x_gsubfn) },
               "mgsub" = { mgsub::mgsub(x_mgsub, c("'", " "), c("", "_")) },
               "nested_gsub" = { gsub("Find", "Replace", gsub("Find","Replace", x_nested_gsub)) },
               "magrittr" = { x_magrittr %<>% gsub("'", "", .) %>% gsub(" ", "_", .) },
               "stringi" = { stri_replace_all_fixed(x_stringi, pattern = c("'", " "), replacement = c("", "_"), vectorize_all = FALSE) }
               )

Unit: microseconds
        expr     min       lq      mean   median       uq     max neval
      gsubfn 458.217 482.3130 519.12820 513.3215 538.0100 715.371   100
       mgsub 180.521 200.8650 221.20423 216.0730 231.6755 460.587   100
 nested_gsub  14.615  15.9980  17.92178  17.7760  18.7630  40.687   100
    magrittr 113.765 133.7125 148.48202 142.9950 153.0680 296.261   100
     stringi   3.950   7.7030   8.41780   8.2960   9.0860  26.071   100
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
24

I know it is a bit old but it is hard to pass on an efficient base R solution. Just use the pipe:

test <- "abcegdfk461mnb"
test2 <- gsub("e|4|6","",test)
print(test2)
Kapo
  • 249
  • 2
  • 2
  • Best answer here. – Brad Mar 18 '22 at 01:51
  • 2
    @Brad - No it's not. This doesn't answer the question. OP asked to replace multiple strings with different replacements. The above only has a single replacement string for multiple patterns. – ismirsehregal Aug 17 '22 at 12:32
  • 1
    Good point, still nice bit of code. Many people didn't know you can declare multiple substrings in gsub using | – Brad Aug 22 '22 at 00:58
7

I think nested gsub will do the job.

gsub("Find","Replace",gsub("Find","Replace",X))
Patrick W
  • 1,485
  • 4
  • 19
  • 27
Atul
  • 79
  • 1
  • 1
2

I would opt for a magrittr and/or dplyr solution, as well. However, I prefer not making a new copy of the object, especially if it is in a function and can be returned cheaply.

i.e.

return(
  catInTheHat %>% gsub('Thing1', 'Thing2', .) %>% gsub('Red Fish', 'Blue 
    Fish', .)
)

...and so on.

d8aninja
  • 3,233
  • 4
  • 36
  • 60
1
gsub("\\s", "", chartr("' ", " _", x)) # Use whitespace and then remove it
zhan2383
  • 669
  • 5
  • 9
1

Try this replace multi text character in column:

df$TYPE <- str_replace_all(df$TYPE, c("test" = "new_test", "G" = "N", "T" = "W"))
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Rupesh Kumar
  • 157
  • 3
0

I use this function, which also allows omitting the argument for the replacement if the replacement is empty:

s=function(x,...,ignore.case=F,perl=F,fixed=F,useBytes=F){
  a=match.call(expand.dots=F)$...
  l=length(a)
  for(i in seq(1,l,2))x=gsub(a[[i]],if(i==l)""else a[[i+1]],x,ignore.case=ignore.case,perl=perl,fixed=fixed,useBytes=useBytes)
  x
}
> s("aa bb cc","aa","dd","bb")
[1] "dd  cc"
nisetama
  • 7,764
  • 1
  • 34
  • 21
0

You got the error 'old' is longer than 'new' because ' is length 2, and _ is length 1. If you add another _ to match with the white space and the length of old, then your code works:

chartr("' ", "__", "a'b c")
#[1] "a_b_c"
Maël
  • 45,206
  • 3
  • 29
  • 67