21

I have a string such as "3.1 ml" or "abc 3.1 xywazw"

I'd like to extract "3.1" from this string. I have found many questions on stackoverflow about the extraction of numbers from a character string, but no solution works for the case of decimal numbers.

epo3
  • 2,991
  • 2
  • 33
  • 60
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225

5 Answers5

21

This approach makes the decimal point and decimal fraction optional and allows multiple numbers to be extracted:

str <- " test 3.1 test 5"
as.numeric(unlist(regmatches(str,
                             gregexpr("[[:digit:]]+\\.*[[:digit:]]*",str))
          )      )
#[1] 3.1 5.0

The concern about negative numbers can be address with optional perl style look-ahead:

 str <- " test -4.5 3.1 test 5"
    as.numeric(unlist(regmatches(str,gregexpr("(?>-)*[[:digit:]]+\\.*[[:digit:]]*",str, perl=TRUE))))

#[1] -4.5  3.1  5.0
IRTFM
  • 258,963
  • 21
  • 364
  • 487
20

Use the stringr library:

x<-"abc 3.1 xywazw"
str_extract(x, "\\d+\\.*\\d*")
[1] "3.1"
tcash21
  • 4,880
  • 4
  • 32
  • 39
8

Regular expression for floating point number from http://www.regular-expressions.info/floatingpoint.html with minor adjustment to work in R.

s <- "1e-6 dkel"
regmatches(s,gregexpr("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?",s)) 
> [[1]]
> [1] "1e-6"
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27
1

You can use regular expressions :

> str <- " test 3.1 test"
> as.numeric(regmatches(str,regexpr("[[:digit:]]+\\.[[:digit:]]+",str)))
[1] 3.1

regexprreturns the start position and length of the matched string. regmatchesreturns the matches. You can then convert it to a number.

Thibaud Ruelle
  • 303
  • 1
  • 16
  • My upvote was locked in. I tried to reverse it when I realized the "." was being used inappropriately. It needs to be escaped. Perhaps you can fix that error and the upvote will then be earned. – IRTFM Oct 08 '13 at 16:57
  • @DWin If you input the text " test 3p1 test" for instance, it is not matched. So I am not sure the "." needs to be escaped here. – Thibaud Ruelle Oct 08 '13 at 19:34
  • "3p1" is matched, but then converted to NA by `as.numeric`. – IRTFM Oct 08 '13 at 19:42
  • Please forgive my silly mistake ... I edited my answer. Thanks for noticing. – Thibaud Ruelle Oct 08 '13 at 20:30
1
readr::parse_number(c("abc 3.1 xywazw", "-3.1 ml", "1,234.56"))
# [1]    3.10   -3.10 1234.56
LMc
  • 12,577
  • 3
  • 31
  • 43
  • If there is more than one number within a string this only extracts the first instance. – LMc Jul 25 '23 at 20:13