6

This should be pretty easy, but the results after using suggestions from other SO posts leave me baffled. And, of course, I'd like to avoid using a For loop.

Reproducible example

library(stringr)
input <- "<77Â 500 miles</dd>"
mynumbers <- str_extract_all(input, "[0-9]")

The variable mynumbers is a list of five characters:

> mynumbers
[[1]]
[1] "7" "7" "5" "0" "0"

But this is what I'm after:

> mynumbers
[1] 77500

This post suggests using paste(), and I guess this should work fine given the correct sep and collapse arguments, but I have got to be missing something essential here. I have also tried to use unlist(). Here is what I've tried so far:

1 - using paste()

> paste(mynumbers)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

2 - using paste()

> paste(mynumbers, sep = " ")
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

3 - using paste()

> paste (mynumbers, sep = " ", collapse = NULL)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

4 - using paste()

> paste (mynumbers, sep = "", collapse = NULL)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

5 - using unlist()

> as.numeric(unlist(mynumbers))
[1] 7 7 5 0 0

I'm hoping some of you have a few suggestions. I guess there's an elegant solution using regex somehow, but I'm also very interested in the paste / unlist problem that is specific to R. Thanks!

vestland
  • 55,229
  • 37
  • 187
  • 305
  • Possible duplicate of [Extract numeric part of strings of mixed numbers and characters in R](http://stackoverflow.com/questions/15451251/extract-numeric-part-of-strings-of-mixed-numbers-and-characters-in-r) – 989 Sep 26 '16 at 09:50
  • 1
    Maybe `as.numeric(paste(str_extract_all(input, "[0-9]", simplify = TRUE), collapse = ""))` ? – zx8754 Sep 26 '16 at 10:09

2 Answers2

10

The str_extract_all returns a list. We need to convert to vector and then paste. To extract the list element we use [[ and as there is only a single element, mynumbers[[1]] will get the vector. Then, do the paste/collapse and as.numeric.

as.numeric(paste(mynumbers[[1]],collapse=""))
#[1] 77500

We can also match one or more non-numeric (\\D+), replace it with "" in gsub and convert to numeric.

as.numeric(gsub("\\D+", "", input))
#[1] 77500
akrun
  • 874,273
  • 37
  • 540
  • 662
1

An alternative using the stringr library:

str_remove_all(input, pattern = "\\D+") %>% as.numeric()
[1] 77500