48

As the title states, I am trying to use gsub where I use a vector for the "pattern" and "replacement". Currently, I have a code that looks like this:

  names(x1) <- gsub("2110027599", "Inv1", names(x1)) #x1 is a data frame
  names(x1) <- gsub("2110025622", "Inv2", names(x1))
  names(x1) <- gsub("2110028045", "Inv3", names(x1))
  names(x1) <- gsub("2110034716", "Inv4", names(x1))
  names(x1) <- gsub("2110069349", "Inv5", names(x1))
  names(x1) <- gsub("2110023264", "Inv6", names(x1))

What I hope to do is something like this:

  a <- c("2110027599","2110025622","2110028045","2110034716", "2110069349", "2110023264")
  b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")
  names(x1) <- gsub(a,b,names(x1))

I'm guessing there is an apply function somewhere that can do this, but I am not very sure which one to use!

EDIT: names(x1) looks like this (There are many more columns, but I'm leaving them out):

> names(x1)
  [1] "2110023264A.Ms.Amp"        "2110023264A.Ms.Vol"        "2110023264A.Ms.Watt"       "2110023264A1.Ms.Amp"      
  [5] "2110023264A2.Ms.Amp"       "2110023264A3.Ms.Amp"       "2110023264A4.Ms.Amp"       "2110023264A5.Ms.Amp"      
  [9] "2110023264B.Ms.Amp"        "2110023264B.Ms.Vol"        "2110023264B.Ms.Watt"       "2110023264B1.Ms.Amp"      
 [13] "2110023264Error"           "2110023264E-Total"         "2110023264GridMs.Hz"       "2110023264GridMs.PhV.phsA"
 [17] "2110023264GridMs.PhV.phsB" "2110023264GridMs.PhV.phsC" "2110023264GridMs.TotPFPrc" "2110023264Inv.TmpLimStt"  
 [21] "2110023264InvCtl.Stt"      "2110023264Mode"            "2110023264Mt.TotOpTmh"     "2110023264Mt.TotTmh"      
 [25] "2110023264Op.EvtCntUsr"    "2110023264Op.EvtNo"        "2110023264Op.GriSwStt"     "2110023264Op.TmsRmg"      
 [29] "2110023264Pac"             "2110023264PlntCtl.Stt"     "2110023264Serial Number"   "2110025622A.Ms.Amp"       
 [33] "2110025622A.Ms.Vol"        "2110025622A.Ms.Watt"       "2110025622A1.Ms.Amp"       "2110025622A2.Ms.Amp"      
 [37] "2110025622A3.Ms.Amp"       "2110025622A4.Ms.Amp"       "2110025622A5.Ms.Amp"       "2110025622B.Ms.Amp"       
 [41] "2110025622B.Ms.Vol"        "2110025622B.Ms.Watt"       "2110025622B1.Ms.Amp"       "2110025622Error"          
 [45] "2110025622E-Total"         "2110025622GridMs.Hz"       "2110025622GridMs.PhV.phsA" "2110025622GridMs.PhV.phsB"

What I hope to get is this:

> names(x1)
  [1] "Inv6A.Ms.Amp"        "Inv6A.Ms.Vol"        "Inv6A.Ms.Watt"       "Inv6A1.Ms.Amp"       "Inv6A2.Ms.Amp"      
  [6] "Inv6A3.Ms.Amp"       "Inv6A4.Ms.Amp"       "Inv6A5.Ms.Amp"       "Inv6B.Ms.Amp"        "Inv6B.Ms.Vol"       
 [11] "Inv6B.Ms.Watt"       "Inv6B1.Ms.Amp"       "Inv6Error"           "Inv6E-Total"         "Inv6GridMs.Hz"      
 [16] "Inv6GridMs.PhV.phsA" "Inv6GridMs.PhV.phsB" "Inv6GridMs.PhV.phsC" "Inv6GridMs.TotPFPrc" "Inv6Inv.TmpLimStt"  
 [21] "Inv6InvCtl.Stt"      "Inv6Mode"            "Inv6Mt.TotOpTmh"     "Inv6Mt.TotTmh"       "Inv6Op.EvtCntUsr"   
 [26] "Inv6Op.EvtNo"        "Inv6Op.GriSwStt"     "Inv6Op.TmsRmg"       "Inv6Pac"             "Inv6PlntCtl.Stt"    
 [31] "Inv6Serial Number"   "Inv2A.Ms.Amp"        "Inv2A.Ms.Vol"        "Inv2A.Ms.Watt"       "Inv2A1.Ms.Amp"      
 [36] "Inv2A2.Ms.Amp"       "Inv2A3.Ms.Amp"       "Inv2A4.Ms.Amp"       "Inv2A5.Ms.Amp"       "Inv2B.Ms.Amp"       
 [41] "Inv2B.Ms.Vol"        "Inv2B.Ms.Watt"       "Inv2B1.Ms.Amp"       "Inv2Error"           "Inv2E-Total"        
 [46] "Inv2GridMs.Hz"       "Inv2GridMs.PhV.phsA" "Inv2GridMs.PhV.phsB" 
Wet Feet
  • 4,435
  • 10
  • 28
  • 41

6 Answers6

32

Lot's of solutions already, here are one more:

The qdap package:

library(qdap)
names(x1) <- mgsub(a,b,names(x1))
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 3
    mapply doesn't really work since gsub still doesn't work on vectors, but the qdap packages works flawlessly. As such, I'm choosing this as the accepted answer. – Wet Feet Oct 17 '13 at 17:07
  • 27
    Beware: `qdap` has an enormous number of dependencies. – Richard Dec 16 '14 at 21:15
  • 5
    I'm quite aware since I wrote it. Secondly, there is no need for your warning. It's open source and this info is spelled out plainly in the first page of the documentation. – Tyler Rinker Dec 16 '14 at 21:55
  • 15
    That's like saying "there's no need to state the library used in this answer because you can always google the function". Sure, the warning might not be necessary, but it's still nice to know without doing further research. – acylam Oct 26 '17 at 20:54
32

From stringr documentation of str_replace_all, "If you want to apply multiple patterns and replacements to the same string, pass a named version to pattern."

Thus using a, b, and names(x1) from above

stringr::str_replace_all(names(x1), setNames(b, a))

EDIT

stringr::str_replace_all calls stringi::stri_replace_all_regex, which can be used directly and is quite a bit quicker.

x <- names(x1)
pattern <- a
replace <- b

microbenchmark::microbenchmark(
  str  = stringr::str_replace_all(x, setNames(replace, pattern)),
  stri = stringi::stri_replace_all_regex(x, pattern, replace, vectorize_all = FALSE)
  )

Unit: microseconds
 expr    min      lq     mean  median   uq    max neval cld
  str 1022.1 1070.45 1286.547 1175.55 1309 2526.8   100   b
 stri  145.2  150.45  190.124  160.55  178  457.9   100  a 
JWilliman
  • 3,558
  • 32
  • 36
  • `str_replace_all` is close to the equivalent of the original poster's `gsub`. But `str_replace` might in some cases be preferred. – dca Mar 31 '20 at 17:08
11

New Answer

If we can make another assumption, the following should work. The assumption this time is that you are really interested in substituting the first 10 characters from each value in names(x1).

Here, I've stored names(x1) as a character vector named "X1". The solution essentially uses substr to separate the values in X1 into 2 parts, match to figure out the correct replacement option, and paste to put everything back together.

a <- c("2110027599", "2110025622", "2110028045",
       "2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")

X1pre <- substr(X1, 1, 10)
X1post <- substr(X1, 11, max(nchar(X1)))

paste0(b[match(X1pre, a)], X1post)
#  [1] "Inv6A.Ms.Amp"        "Inv6A.Ms.Vol"        "Inv6A.Ms.Watt"      
#  [4] "Inv6A1.Ms.Amp"       "Inv6A2.Ms.Amp"       "Inv6A3.Ms.Amp"      
#  [7] "Inv6A4.Ms.Amp"       "Inv6A5.Ms.Amp"       "Inv6B.Ms.Amp"       
# [10] "Inv6B.Ms.Vol"        "Inv6B.Ms.Watt"       "Inv6B1.Ms.Amp"      
# [13] "Inv6Error"           "Inv6E-Total"         "Inv6GridMs.Hz"      
# [16] "Inv6GridMs.PhV.phsA" "Inv6GridMs.PhV.phsB" "Inv6GridMs.PhV.phsC"
# [19] "Inv6GridMs.TotPFPrc" "Inv6Inv.TmpLimStt"   "Inv6InvCtl.Stt"     
# [22] "Inv6Mode"            "Inv6Mt.TotOpTmh"     "Inv6Mt.TotTmh"      
# [25] "Inv6Op.EvtCntUsr"    "Inv6Op.EvtNo"        "Inv6Op.GriSwStt"    
# [28] "Inv6Op.TmsRmg"       "Inv6Pac"             "Inv6PlntCtl.Stt"    
# [31] "Inv6Serial Number"   "Inv2A.Ms.Amp"        "Inv2A.Ms.Vol"       
# [34] "Inv2A.Ms.Watt"       "Inv2A1.Ms.Amp"       "Inv2A2.Ms.Amp"      
# [37] "Inv2A3.Ms.Amp"       "Inv2A4.Ms.Amp"       "Inv2A5.Ms.Amp"      
# [40] "Inv2B.Ms.Amp"        "Inv2B.Ms.Vol"        "Inv2B.Ms.Watt"      
# [43] "Inv2B1.Ms.Amp"       "Inv2Error"           "Inv2E-Total"        
# [46] "Inv2GridMs.Hz"       "Inv2GridMs.PhV.phsA" "Inv2GridMs.PhV.phsB"

Old Answer

If we can assume that names(x1) is in the same order as the pattern and replacement and that it is basically a one-for-one replacement, you might be able to get away with just sapply.

Here's an example of that particular situation:

Imagine "names(x)" looks something like this:

X1 <- paste0("A2", a, sequence(length(a)))
X1
# [1] "A221100275991" "A221100256222" "A221100280453" 
# [4] "A221100347164" "A221100693495" "A221100232646"

Here's our pattern and replacement vectors:

a <- c("2110027599", "2110025622", "2110028045", 
       "2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")

This is how we might use sapply if these assumptions are valid.

sapply(seq_along(a), function(x) gsub(a[x], b[x], X1[x]))
# [1] "A2Inv11" "A2Inv22" "A2Inv33" "A2Inv44" "A2Inv55" "A2Inv66"
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
3

Try mapply.

names(x1) <- mapply(gsub, a, b, names(x1), USE.NAMES = FALSE)

Or, even easier, str_replace from stringr.

library(stringr)
names(x1) <- str_replace(names(x1), a, b)
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • I posted this answer above already but here there's no need for `USE.NAMES = FALSE` – Tyler Rinker Oct 17 '13 at 12:25
  • @TylerRinker Yep, you beat me to it. `USE.NAMES = FALSE` gives a minor performance benefit that, for large datasets, may save you almost as much time as it took you to type the extra characters. – Richie Cotton Oct 17 '13 at 12:32
  • I tried str_replace_all from the stringr package, which should do as described. However, it gave me an error: Error in check_pattern(pattern, string, replacement) : Lengths of string and pattern not compatible EDIT: I realised that str_replace_all requires names(x1) to be the same length as a and b, which is the reason why it doesn't work. – Wet Feet Oct 17 '13 at 17:12
  • @WetFeet Doesn't that mean that the `str_replace` solution is wrong? I think so but unsure since no one has raised that yet. – Heisenberg Oct 24 '14 at 01:12
  • 1
    I don't think `mapply` works either, since it does not recursively apply gsub to names(x1) – Heisenberg Oct 24 '14 at 02:40
  • Tested these and they both don't work. – Caspar V. Jun 23 '22 at 12:42
3

I needed to do something similar but had to use base R. As long as your vectors are the same length, I think this will work

for (i in seq_along(a)){
  names(x1) <- gsub(a[i], b[i], names(x1))
} 
Jenna Allen
  • 454
  • 3
  • 11
2

Somehow names<- and match seems much more appropriate here...

names( x1 ) <- b[ match( names( x1 ) , a ) ]

But I am making the assumption that the elements of vector a are the actual names of your data.frame.

If a really is a pattern found within each of the names of x1 then this grepl approach with names<- could be useful...

new <- sapply( a , grepl , x = names( x1 ) )
names( x1 ) <- b[ apply( new , 1 , which.max ) ]
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Since the pattern is found within the names of x1, match returns NA values. grep1 doesn't really work too as it replaces the whole part of names rather than the part with the numbers (as in the edit) – Wet Feet Oct 17 '13 at 16:42