1

All, I am trying to extract values from text strings and I found the thread: Extracting decimal numbers from a string. However, the cases I am encountering are the numbers without a leading zero and the solution will drop the decimal point. For example:

> str <- "the value is .55"
> as.numeric(str_extract(str, "\\d+\\.*\\d*"))
[1] 55

I am hoping to recover the 0.55 value instead of 55 and any help is greatly appreciated!

H.Hung
  • 127
  • 7

2 Answers2

2

With str_extract_all if you have more than one value per string. The key is switching the + (one or more) with a * (zero or more) for the integer part.

str <- "the value is .55 or 0.9 and 89"
library(stringr)

as.numeric(unlist(str_extract_all(str, "\\d*\\.*\\d+")))
[1]  0.55  0.90 89.00
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
0

In base R we can do

> x <- 'the value is .55 or 0.9 and 89'
> x1 <- "the value is .55"
> f <- \(x) as.numeric(el(regmatches(x, gregexpr('[0|\\.]?\\d+\\.?\\d+', x))))
> f(x)
[1]  0.55  0.90 89.00
> f(x1)
[1] 0.55
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • A little expansion on `el`'s use here likely quite useful to those of us following along ... – Chris Jul 25 '23 at 21:11
  • @chris `el(x)` is very similar to `x[[1]]` but saves one byte. – jay.sf Jul 25 '23 at 21:12
  • But, why is the expansion of `object[i][[i]]` (typically replaceable by `object[[i]]`) more serviceable here? Not wishing to annoy, just a very 'unique' expression within the context of many regex question responses. – Chris Jul 25 '23 at 21:17
  • @Chris Good question (but it's rather `object[where][[1L]]` than `object[i][[i]]`.). Documentation says _"el(object, i) is equivalent to object[i][[1]] (and should typically be replaceable by object[[i]])."_ The reason why `el(1)`, i.e. `where` is missing, actually doesn't yield an error may lie somewhere in the C source code of the bracket function [`\`[\`()`](https://github.com/search?q=SEXP%20attribute_hidden%20do_subset+repo:wch/r-source&type=Code). – jay.sf Jul 25 '23 at 21:34
  • +1, yes, my above notation is wrong, not [[i]] but [[1'L']]. Closer read required. So, what data would force `where` missing to demonstrate this?, as neither `x` nor `x1` above appear to, unless I've misinterpretted and`object[where}[[1L]]` means no leading `0.`, and missing is in `x`. – Chris Jul 25 '23 at 21:53
  • @Chris Sorry my mind was somewhere deep in the source code, you can literally replace to `f <- \(x) as.numeric(regmatches(x, gregexpr('[0|\\.]?\\d+\\.?\\d+', x))[[1L]])`. I just like `el(x)` better than `x[[1L]]` because it's more elegant. Your question is about why the former might be better than the latter I cannot answer because I currently don't understand why `el(1)` doesn't throw an `Error: object 'where' not found`. – jay.sf Jul 25 '23 at 22:07
  • @Chris Your (or rather my) question has been [answered](https://stackoverflow.com/q/76766868/6574038) satisfactorily. While `el(c(1, 2))` gives 1, `el(c(1, 2), 2)` yields 2. In the first case where`where=` is missing, [lazy evaluation](https://bookdown.dongzhuoer.com/hadley/adv-r/lazy-evaluation.html) prevents occurrence of an error. – jay.sf Jul 26 '23 at 06:23