0

I have a data.frame with two variables of string expressions like "ABC`w/XYZ 8", where w = any number from 1 to 999. What I need to do is to substract w and substitute the whole string with it. I use this code:

df <- data.frame(a = c("ABC`5/XYZ 8", "A`25/BHU 19", "ach`246/chy 0"), b = c("sfse`3/cjd 65", "jlke`234/Chu 19", "h`45/hy 0"))

df$a <- sapply(df$a, function(x) {substr(df$a[x], regexpr("`[0-9]+/", df$a[x]) +1,
+  regexpr("`[0-9]+/", df$a[x]) + attr(regexpr("`[0-9]+/", df$a[x]), "match.length")-2)})

It works, but instead of a = c(5, 25, 246) I get a = c(25, 5, 246). I guess this happens because of the factor class of a. However, when a is class character I get NAs as an output. Is there a way to preserve the order of a or use sapply and substr for array of characters?

Semyon Tamara
  • 33
  • 1
  • 5
  • 1
    Possible duplicate of [how to extract the first number from each string in a vector in R?](http://stackoverflow.com/questions/25885361/how-to-extract-the-first-number-from-each-string-in-a-vector-in-r). Another very relevant post: [extract first number from string](http://stackoverflow.com/questions/23323321/r-extract-first-number-from-string) – Jota Oct 31 '16 at 16:58

1 Answers1

1

We can use sub to extract the numbers specified in the 'w' position of the string. Match the pattern of one or more alphabets along with "``", capture one or more numbers that follows it as a group ((\\d+)) followed by other characters (.*) and replace it with the backreference of the capture group.

as.numeric(sub("[A-Za-z`]+(\\d+).*", "\\1", df$a))
#[1]   5  25 246

Or another option is str_extract

library(stringr)
as.numeric(str_extract(df$a, "\\d+"))
#[1]   5  25 246
akrun
  • 874,273
  • 37
  • 540
  • 662