1

I have the following string in R:

 str <- "number_123 some text number_4"

Now, I want to extract the numbers 123 and 4 into vector of numerics. However, I was not able to come up with a regular expression to identify the numbers 123 and 4. The only identifier in this problem is "number_". I would like to extract the subsequent number which can be anything from 1 to 3 figures.

I found some regular expressions for some similar issues here. However I was not able to change it in such a way such that it fits my problem.

Thanks for your help!

Edit: sorry being not more precise. The actual string looks like the following:

str <- '\"number_123\"somtext 123 some more text\"number_1\" text'

As before I would like to extract the numbers following the substring \"number. Unfortunately, all your solutions did not work. I got the following warning message:

NAs introduced by coercion
lorenzbr
  • 161
  • 11

1 Answers1

0

Ugly, but works:

foo <- "number_123 some text number_4"
as.numeric(gsub("number_", "", grep("number_", unlist(strsplit(foo, " ")), value = TRUE)))

Readable solution using pipe:

library(magrittr)
'\"number_123\"somtext 123 some more text\"number_1\" text' %>%
    strsplit(" ") %>% # Split character string per space
    unlist() %>%
    grep("number_", ., value = TRUE) %>% # Extract "words" with number_
    gsub("number_", "", .) %>% # Remove "number_" part
    gsub('"', "", .) %>% # Remove "number_" part
    gsub("[a-z]", "", .) %>% # Remove "number_" part
    as.numeric() # Turn into numbers

[1] 123   1
pogibas
  • 27,303
  • 19
  • 84
  • 117