1

Suppose I have a string x like so.

x <- "CTTTANNNNNNNYG"

I would like to replace each letter in x with a different string that may not be f the same length.

a <- c("A","C","G","T","W","S","M","K","R","Y","B","D","H","V","N")
b <- c("A","C","G","T","(A|T)","(C|G)","(A|C)","(G|T)","(A|G)","(C|T)","(C|G|T)","(A|G|T)","(A|C|T)","(A|C|G)","(A|C|G|T)")

If I wanted to replace the letters in vector a with the corresponding ones in vector b, I would want to manipulate string x into:

"CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

I've tried using mapply(gsub, a,b,x) and str_replace() to no avail. Any help would be appreciated.

alki
  • 3,334
  • 5
  • 22
  • 45
  • Possible duplicate of http://stackoverflow.com/questions/26171318/regex-for-preserving-case-pattern-capitalization/2617170 – thelatemail Feb 24 '16 at 06:17

3 Answers3

4

We can use mgsub from library(qdap)

library(qdap)
mgsub(a, b, x)
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"
akrun
  • 874,273
  • 37
  • 540
  • 662
4

Since replacements are "fixed" and involve each just one letter, you can achieve the same result without using neither regex nor any additional packages. For instance:

vapply(strsplit(x,"",fixed=TRUE),function(z) paste(setNames(b,a)[z],collapse=""),"")
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"
nicola
  • 24,005
  • 3
  • 35
  • 56
2

If you wanted to do this with base functions, you need to basically do each of the replacements sequentially (gsub isn't vectorized in this way). Here's one way to do that

Reduce(
    function(x, replace) {
        gsub(replace$pattern, replace$value, x)
    }, 
    Map(function(a,b) list(pattern=a, value=b), a, b), 
    init=x
)
# [1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

We use Map to make pairs of match/replace values and then sequentially apply them to the string with Reduce

MrFlick
  • 195,160
  • 17
  • 277
  • 295