Replace substring in one column of dataframe with value from another column in same row

Question

I have dataframe of character columns named new_sgs that looks like this:

     SG.Name RegionCode
1 AW02PASGA001         01
2 AW02PASGA002         01
3 AW02PASGA003         01
4 AW02PASGA004         01
5 AW02PASGA005         01
6 AW02PASGA006         01
...

I want to replace '02' in the strings of column 1 with the string in column2. This does the job for row 1:

new_sgs$SG.Name[1] <- gsub("AW02", paste0("AW", new_sgs$RegionCode[1]), new_sgs$SG.Name[1])

Is there a way to make this change to every row using one of the apply functions? I've tried

sapply(new_sgs, function(x) gsub("AW02", paste0("AW", new_sgs$RegionCode[x]), new_sgs$SG.Name[x]))

but this is what I get:

    SG.Name RegionCode
[1,] NA      NA        
[2,] NA      NA        
[3,] NA      NA        
[4,] NA      NA        
[5,] NA      NA        
[6,] NA      NA 
...
Warning messages:
1: In gsub("AW02", paste0("AW", test$RegionCode[x]), test$SG.Name[x]) :
  argument 'replacement' has length > 1 and only the first element will be used
2: In gsub("AW02", paste0("AW", test$RegionCode[x]), test$SG.Name[x]) :
  argument 'replacement' has length > 1 and only the first element will be used

Thanks!

Luke

Possible duplicate of [R: gsub, pattern = vector and replacement = vector](http://stackoverflow.com/questions/19424709/r-gsub-pattern-vector-and-replacement-vector) — aosmith, Sep 08 '16 at 21:23

score 3 · Answer 1 · answered Sep 09 '16 at 02:50

str_replace() from the stringr package will vectorise over a pattern and replacement as you need. See example below:

library(stringr)

x <- data.frame(
  SG.Name = c("AW02PASGA001", "AW02PASGA002", "AW02PASGA003"),
  RegionCode = c("01", "01", "01")
)

str_replace(x$SG.Name, "02", x$RegionCode)
#> [1] "AW01PASGA001" "AW01PASGA002" "AW01PASGA003"

score 2 · Accepted Answer · answered Sep 08 '16 at 21:29

If it is guaranteed that the string you want to replace comes at position of 3 and 4 of the Name, you can just use substr:

substr(df$SG.Name, 3, 4) <- df$RegionCode
df
#       SG.Name RegionCode
#1 AW01PASGA001         01
#2 AW01PASGA002         01
#3 AW01PASGA003         01
#4 AW01PASGA004         01
#5 AW01PASGA005         01
#6 AW01PASGA006         01

Alternatively you can use sub with mapply:

df$SG.Name = mapply(function(rc, nam) sub("\\d+", nam, rc), df$RegionCode, df$SG.Name, USE.NAMES = F)

Thanks! I Extra credit for both simple vector solution and for helping me figure out the use of mapply. Note: I had to reverse the order of nam and rc in the sub function to get it to work right. — Luke, Sep 09 '16 at 16:31

Replace substring in one column of dataframe with value from another column in same row

2 Answers2