61

I want to test a character string and see which elements could actually be numeric. I can use regex to test for integer successful but am looking to see which elements have all digits and 1 or less decimals. Below is what I've tried:

x <- c("0.33", ".1", "3", "123", "2.3.3", "1.2r")
!grepl("[^0-9]", x)   #integer test

grepl("[^0-9[\\.{0,1}]]", x)  # I know it's wrong but don't know what to do

I'm looking for a logical output so I'd expect the following results:

[1] TRUE TRUE TRUE TRUE FALSE FALSE
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 3
    what about !is.na(as.numeric(x)) ? edit: Oh, I see someone answered with that as I was double checking it worked on your example (to check it worked as required prior to pressing 'Add comment') – Glen_b Nov 30 '12 at 03:03
  • I just realized there may be NAs already in the string. – Tyler Rinker Nov 30 '12 at 03:07
  • 2
    If you want to distinguish NAs as well, try this: `ifelse(is.na(x), NA, TRUE) & is.na(as.numeric(x))`. – Josh O'Brien Nov 30 '12 at 03:48

6 Answers6

78

Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:

> !is.na(as.numeric(x))
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE

As noted below by Josh O'Brien this won't pick up things like 7L, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,

x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2"   "1e4"   "1.2.3" "5L"   
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE  TRUE

...and then strip the "L" from just those elements using gsub and indexing.

joran
  • 169,992
  • 32
  • 429
  • 468
  • 3
    The simplicity of it. Brilliant. : embarrassed: – Tyler Rinker Nov 30 '12 at 03:03
  • @TylerRinker I almost got sucked down the same path you did when I first read the question. Then I realized that someone smarter than me had already walked that road, and at the end of that road was `as.numeric`. – joran Nov 30 '12 at 03:04
  • 8
    @joran - Is there an alternative method to not output the warnings, or would the best bet to be just wrap it in `suppressWarnings` and get on with it? – thelatemail Nov 30 '12 at 03:06
  • 2
    @Joran what if there's already NA in the string? Nevermind use: `!is.na(as.numeric(na.omit(x)))` In this case this will work but may not for other future searchers. – Tyler Rinker Nov 30 '12 at 03:09
  • 1
    @TylerRinker What is the issue there? An NA should still give FALSE shouldn't it? Is that not your desired outcome? – Dason Nov 30 '12 at 03:10
  • FWIW, this identifies `"7e6"` as a number, but not `"7L"`. – Josh O'Brien Nov 30 '12 at 03:12
  • 4
    @thelatemail Not that I know of. I think `suppressWarnings` would probably be the way to go. – joran Nov 30 '12 at 03:12
  • @JoshO'Brien True. At least those could be caught beforehand with a relatively simply regex, I suppose. – joran Nov 30 '12 at 03:16
  • @JoshO'Brien That's ... odd behaviour there on `"7L"`; I'd expect `as.numeric` to make that work, since is.numeric(7L) is TRUE – Glen_b Nov 30 '12 at 03:16
  • 1
    Even `as.integer("1L")` returns NA. – IRTFM Nov 30 '12 at 03:31
  • @Dason, I'm actually first testing if the entire string is "numeric" and then operating on individual elements after that. – Tyler Rinker Nov 30 '12 at 03:41
  • factors would return true to this. I would add as.character for prevent this. !is.na(as.numeric(as.character(x))) – Scott Nov 15 '21 at 03:37
9

I recently encountered a similar problem where I was trying to write a function to format values passed as a character string from another function. The formatted values would ultimately end up in a table and I wanted to create logic to identify NA, character strings, and character representations of numbers so that I could apply sprintf() on them before generating the table.

Although more complicated to read, I do like the robustness of the grepl() approach. I think this gets all of the examples brought up in the comments.

x <- c("0",37,"42","-5","-2.3","1.36e4","4L","La","ti","da",NA)

y <- grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)

This would be evaluate to (formatted to help with visualization):

x
[1] "0"  "37"   "42"  "-5"   "-2.3"   "1.36e4" "4L" "La"     "ti"     "da"     NA 

y
[1] TRUE  TRUE   TRUE  TRUE   TRUE     TRUE    TRUE FALSE   FALSE    FALSE    FALSE

The regular expression is TRUE for:

  • positive or negative numbers with no more than one decimal OR
  • positive or negative integers (e.g., 4L) OR
  • positive or negative numbers in scientific notation

Additional terms could be added to handle decimals without a leading digit or numbers with a decimal point but not digits after the decimal if the dataset contained numbers in poor form.

penguinv22
  • 349
  • 5
  • 12
4

Avoid re-inventing the wheel with check.numeric() from package varhandle.

The function accepts the following arguments:

v The character vector or factor vector. (Mandatory)

na.rm logical. Should the function ignore NA? Default value is FLASE since NA can be converted to numeric. (Optional)

only.integer logical. Only check for integers and do not accept floating point. Default value is FALSE. (Optional)

exceptions A character vector containing the strings that should be considered as valid to be converted to numeric. (Optional)

ignore.whitespace logical. Ignore leading and tailing whitespace characters before assessing if the vector can be converted to numeric. Default value is TRUE. (Optional)

qwr
  • 9,525
  • 5
  • 58
  • 102
1

Another possibility:

x <- c("0.33", ".1", "3", "123", "2.3.3", "1.2r", "1.2", "1e4", "1.2.3", "5L", ".22", -3)
locs <- sapply(x, function(n) {

    out <- try(eval(parse(text = n)), silent = TRUE)
    !inherits(out, 'try-error')

}, USE.NAMES = FALSE)

x[locs]
## [1] "0.33" ".1"   "3"    "123"  "1.2"  "1e4"  "5L"   ".22"  "-3"  

x[!locs]
## [1] "2.3.3" "1.2r"  "1.2.3"
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
0

Inspired by the answers here, my function trims leading and trailing white spaces, can handel na.strings, and optionally treats NA as numeric like. Regular expression was enhanced as well. See the help info for details. All you want!

check if a str obj is actually numeric
@description check if a str obj is actually numeric
#' @param x a str vector, or a factor of str vector, or numeric vector. x will be coerced and trimws.
#' @param na.strings case sensitive strings that will be treated to NA.
#' @param naAsTrue whether NA (including actual NA and na.strings) will be treated as numeric like
#' @return a logical vector (vectorized).
#' @export
#' @note Using regular expression
#' \cr TRUE for any actual numeric c(3,4,5,9.9) or c("-3","+4.4",   "-42","4L","9L",   "1.36e4","1.36E4",    NA, "NA", "","NaN", NaN): 
#' \cr positive or negative numbers with no more than one decimal c("-3","+4.4") OR
#' \cr positive or negative integers (e.g., c("-42","4L","39L")) OR
#' \cr positive or negative numbers in scientific notation c("1.36e4","1.36E4")
#' \cr NA, or na.strings
is.numeric.like <- function(x,naAsTrue=TRUE,na.strings=c('','.','NA','na','N/A','n/a','NaN','nan')){
    x = trimws(x,'both')
    x[x %in% na.strings] = NA
    # https://stackoverflow.com/a/21154566/2292993
    result = grepl("^[\\-\\+]?[0-9]+[\\.]?[0-9]*$|^[\\-\\+]?[0-9]+[L]?$|^[\\-\\+]?[0-9]+[\\.]?[0-9]*[eE][0-9]+$",x,perl=TRUE)
    if (naAsTrue) result = result | is.na(x)
    return((result))
}
Jerry T
  • 1,541
  • 1
  • 19
  • 17
-3

You can also use:

readr::parse_number("I am 4526dfkljvdljkvvkv")

To get 4526.

SteveS
  • 3,789
  • 5
  • 30
  • 64
  • 1
    this extracts the number from a string, but doesn't check if the string is actually numeric – qwr Aug 01 '19 at 01:41