3

I am trying to isolate a portion of a string in R. The strings have the form ABC_constantStuff_ABC_randomStuff and ABC is what I am trying to extract. ABC is unknown and can be 1-3 characters long. I've been trying grep and gsub but am unsure how to specify my regular expression using

str <- 'GDP\" title=\"GDP - News\"></a>"'
symbol <- gsub(pattern,'',str)

Here GDP is unknown and can be 1-3 characters long, \" title=\" is constant in every string and I would like to remove \" title=\"GDP - News\"></a>"

Thank you for help in advance.

rrs
  • 9,615
  • 4
  • 28
  • 38
Jørgen
  • 313
  • 3
  • 14

2 Answers2

4

A simple one is

R> gsub("^([A-Z]*)_.*", "\\1", "ABC_constantStuff_ABC_randomStuff")
[1] "ABC"
R> 

which gets all letters up to the first _.

Another one assumming _ is your separator is

R> strsplit( "ABC_constantStuff_ABC_randomStuff", "_")[[1]][c(1,3)]
[1] "ABC" "ABC"
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
3

Does this help?

> sub("\".*$", "", str)
Arun
  • 116,683
  • 26
  • 284
  • 387