0

I have the following column in a dataframe. I've made it a vector below:

geneIDsfull=c("TRINITY_DN11579_c0_g1_i1^LAC17_ARATH^Cu-oxidase_3","TRINITY_DN44579_c0_g1_i4", "TRINITY_DN113_c0_g1_i2")

What I would like to do is place the Trinity ID number (TRINITY_DN11579_c0_g1_i1) in one column, then place any annotation (LAC17_ARATH^Cu-) in a separate column. Each desired annotation comes after a caret. I tried the following function:

sub ("^.*", "", geneIDsfull) 

but did not have any success. Any help is appreciated. Thanks.

  • Just escape the metacharcter `sub("\\^.*", "", geneIDsfull)`. If you don't escape, the `^` is to spsecify the start of the string and `.*` implies other character. So, you would be selecting all characters from the start and replace with blank. Essentiallly, blank will be the output – akrun Jun 16 '20 at 00:52
  • I'm still having issues extracting this part of the annotation: ```^LAC17_ARATH^Cu-oxidase_3```, There are several of these that have multiple ^ in the annotation, but for every one I only need to extract what is between the first and second ^ ("LAC17_ARATH"). – Patrick Thomas Jun 16 '20 at 01:29
  • In that case, capture as a group `str1 <- "^LAC17_ARATH^Cu-oxidase_3"; sub("\\^([^\\^]+)\\^.*", "\\1", str1)# [1] "LAC17_ARATH"` – akrun Jun 16 '20 at 18:31

0 Answers0