2

I have a question about the use of gsub. The rownames of my data, have the same partial names. See below:

> rownames(test)
[1] "U2OS.EV.2.7.9"   "U2OS.PIM.2.7.9"  "U2OS.WDR.2.7.9"  "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9"  "U2OS.EV.18.6.9"  "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX"   "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM"   "X5.U2OS...EV"    "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC"   "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV"    "EXP2.U2OS.MYC"   "EXP2.U2OS.PIM1"  "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"

In my previous question, I asked if there is a way to get the same names for the same partial names. See this question: Replacing rownames of data frame by a sub-string

The answer is a very nice solution. The function gsub is used in this way:

 transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)

Now, I have another problem, the program I run with R (Galaxy) doesn't recognize the | characters. My question is, is there another way to get to the same solution without using this |?

Thanks!

Community
  • 1
  • 1
Lisann
  • 5,705
  • 14
  • 41
  • 50
  • 1
    I'm sorry but I don't understand. What program do you run with R? What errors do you get? – Andrie Jun 09 '11 at 10:28
  • I run R in galaxy (http://main.g2.bx.psu.edu/) and I need to fill in the variable in this way: MYC|EV|PIM|WDR|OBX But galaxy doesn't recognize it – Lisann Jun 09 '11 at 10:31
  • Have you tried escaping or double escaping the `|` signs? – Sacha Epskamp Jun 09 '11 at 10:35
  • 1
    Why don't you ask the Galaxy guys why this won't work in their application? The code is valid R. – Gavin Simpson Jun 09 '11 at 10:46
  • And if that doesn't work, you may have to write a function that uses a loop or if statements combined with `grep` to test for each phrase separately. But I am not going to do your work for you. – Andrie Jun 09 '11 at 10:51
  • Just to back up the comment from Gavin Simpson, I have tried it and it works fine in plain R (i.e. no galaxy). However, there is a trailing bracket missing in your code - could that be the answer? – nullglob Jun 09 '11 at 10:55
  • 1
    @Lisann : please use `c()` or dput to give us an example to work with. See also http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Joris Meys Jun 09 '11 at 10:58
  • @Joris Meys, Sorry for not giving an example to work with. – Lisann Jun 09 '11 at 11:15

2 Answers2

2

If you don't want to use the "|" character, you can try something like :

Rnames <-
c( "U2OS.EV.2.7.9",   "U2OS.PIM.2.7.9",  "U2OS.WDR.2.7.9",  "U2OS.MYC.2.7.9" ,
 "U2OS.OBX.2.7.9" , "U2OS.EV.18.6.9"  ,"U2O2.PIM.18.6.9" ,"U2OS.WDR.18.6.9"  )

Rlevels <- c("MYC","EV","PIM","WDR","OBX")    
tmp <- sapply(Rlevels,grepl,Rnames)
apply(tmp,1,function(i)colnames(tmp)[i])
[1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR"

But I would seriously consider mentioning this to the team of galaxy, as it seems to be rather awkward not to be able to use the symbol for OR...

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • Thanks to Joris Meys, This solution works for R in combination with galaxy. For sure, I'm gonna ask the galaxy developers why it doesn't work with character | – Lisann Jun 09 '11 at 11:44
2

I wouldn't recommend doing this in general in R as it is far less efficient than the solution @csgillespie provided, but an alternative is to loop over the various strings you want to match and do the replacements on each string separately, i.e. search for "MYN" and replace only in those rownames that match "MYN".

Here is an example using the x data from @csgillespie's Answer:

x <-  c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
       "U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
       "U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")

Copy the data so we have something to compare with later (this just for the example):

x2 <- x

Then create a list of strings you want to match on:

matches <- c("MYC","EV","PIM","WDR","OBX")

Then we loop over the values in matches and do three things (numbered ##X in the code):

  1. Create the regular expression by pasting together the current match string i with the other bits of the regular expression we want to use,
  2. Using grepl() we return a logical indicator for those elements of x2 that contain the string i
  3. We then use the same style gsub() call as you were already shown, but use only the elements of x2 that matched the string, and replace only those elements.

The loop is:

for(i in matches) {
    rgexp <- paste(".*(", i, ").*", sep = "") ## 1
    ind <- grepl(rgexp, x)                    ## 2
    x2[ind] <- gsub(rgexp, "\\1", x2[ind])    ## 3
}
x2

Which gives:

> x2
 [1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR" "MYC" "OBX" "OBX" "MYC"
Community
  • 1
  • 1
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453