3

Given that the base as.integer() coercion of the empty string is NA without warning, as in:

str( as.integer(c('1234','5678','')) ) # int [1:3] 1234 5678 NA -- no warning

I'm struggling to understand why bit64::as.integer64() coerces to zero without warning:

library('bit64')
str( as.integer64(c('1234','5678','')) ) # integer64 [1:3] 1234 5678 0 -- no warning

What's even stranger is to compare:

str( as.integer(c('1234','5678','', 'Help me Stack Overflow')) ) 
# int [1:4] 1234 5678 NA NA -- coercion warning

with:

str( as.integer64(c('1234','5678','', 'Help me Stack Overflow')) ) 
# integer64 [1:4] 1234 5678 0 NA -- no warning

My workaround for this fails miserably:

asInt64 <- function(s){
  require(bit64)
  ifelse(grepl('^\\d+$',s), as.integer64(s), NA_integer64_)
}
str(asInt64(c('1234','5678','', 'Help me Stack Overflow')) )
# num [1:4] 6.10e-321 2.81e-320 0.00 0.00
# huh?

So, I'm asking:

  • why does this happen?

  • what is the best workaround?

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • 1
    Maybe because [`strtoll("", ...)`](https://github.com/cran/bit64/blob/master/src/integer64.c#L200) is `0`. Workaround could be to convert these `grepl("\\D|^$", c('1234','5678','', 'Help me Stack Overflow'))` to `NA` afterwards? – lukeA Sep 20 '17 at 22:40
  • @lukeA you basically answered the question, thanks. This [reference](https://www.techonthenet.com/c_language/standard_library_functions/stdlib_h/strtoll.php) suggests testing for a conversion error when `strtoll` returns 0, which `as.integer64` isn't quite catching with the `endpointer` logic. I'm going to try to propose a change, although my C is quite rusty. If you want to post your comment as an answer I'll accept it. – C8H10N4O2 Sep 21 '17 at 14:54

1 Answers1

1

Why it happens

As @lukeA's comment points out, the source for as.integer64.character is:

SEXP as_integer64_character(SEXP x_, SEXP ret_){
  long long i, n = LENGTH(ret_);
  long long * ret = (long long *) REAL(ret_);
  const char * str;
  char * endpointer;
  for(i=0; i<n; i++){
    str = CHAR(STRING_ELT(x_, i)); endpointer = (char *)str; // thanks to Murray Stokely 28.1.2012
    ret[i] = strtoll(str, &endpointer, 10);
    if (*endpointer)
      ret[i] = NA_INTEGER64;
  }
  return ret_;
}

and strtoll("") returns zero with an error when called on an invalid value such as "" or "ABCD". One reference strtoll example handles this like:

/* If the result is 0, test for an error */
if (result == 0)
{
    /* If a conversion error occurred, display a message and exit */
    if (errno == EINVAL)
    {
        printf("Conversion error occurred: %d\n", errno);
        exit(0);
    }

    /* If the value provided was out of range, display a warning message */
    if (errno == ERANGE)
        printf("The value provided was out of range\n");
}

So what I am trying to figure out now is why *endpointer is evaluating to FALSE. (Stay tuned...)

Workaround

Here's the workaround to mimic the behavior of base as.integer:

library(bit64)
charToInt64 <- function(s){
  stopifnot( is.character(s) )
  x <- as.integer64(s)
  # as.integer64("") unexpectedly returns zero without warning.  
  # Overwrite this result to return NA without warning, similar to base as.integer("")
  x[s==""] <- NA_integer64_
  # as.integer64("ABC") unexpectedly returns zero without warning.
  # Overwrite this result to return NA with same coercion warning as base as.integer("ABC")
  bad_strings <- grepl('\\D',s) # thanks to @lukeA for the hint
  if( any(bad_strings) ){
    warning('NAs introduced by coercion')
    x[bad_strings] <- NA_integer64_  
  }
  x
}

To see that this works:

test_string <- c('1234','5678','', 'Help me Stack Overflow')
charToInt64(test_string) # returns int64 [1] 1234 5678 <NA> <NA> with warning
charToInt64(head(test_string,-1)) # returns int64 [1] 1234 5678 <NA> without warning
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134