I have many filenames which look like:
txt= "MA0051_IRF2.xml"
I want to extract IRF2
which is between "_" and ".". How do I do this in R?
To achieve this, you need a regexp that
.*
[_]
([^.]+)
[.]
.*
In your call to gsub, you then
\\1
(we need to escape the backslash, hence the double backslash)Example:
gsub(".*[_]([^.]+)[.].*", "\\1", "MA0051_IRF2.xml")
an other possibility with the stringr package:
str_extract(x, perl("(?<=_)(.+)(?=\\.)"))
Here's a possible solution that doesn't require regex knowledge:
txt <- "MA0051_IRF2.xml"
library(qdap)
genXtract(txt, "_", ".")
## _ : .
## "IRF2"