0

In R, to find the length of a vector (bigz or not), one typically uses the length function. E.g.

NonBigZ <- 1:10

NonBigZ
[1]  1  2  3  4  5  6  7  8  9 10

length(NonBigZ)
[1] 10

However, using the gmp package, if you declare a bigz vector, the length of the vector is returned automatically. E.g.

BigZ <- as.bigz(1:10)

BigZ
Big Integer ('bigz') object of length 10:  ## <<-- length given here
 [1] 1  2  3  4  5  6  7  8  9  10

## This seems redundant as it is already given above
length(BigZ)
[1] 10

I would like to retrieve that information without making the extra call to length. I know length is lightning fast, but it could save a pretty decent chunk of time if you could avoid calling it. Observe:

system.time(sapply(1:10^6, function(x) length(BigZ)))
user  system elapsed 
7.81    0.00    7.84

I have tried attributes(BigZ) as well as str(BigZ) to no avail. I have read the gmp documentation as well, but couldn't find anything.

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
  • 1
    `gmp:::print.bigz`, also, calculates `length` using `gmp:::length.bigz`. It seems that `length.bigz` is not just an attribute access function like `length` -- e.g. see `ns = c(1, 10, 50, 100, 200, 500, 1e3, 5e3, 1e4); timings = sapply(ns, function(n) { x = as.bigz(seq_len(n)); summary(microbenchmark(length(x), unit = "ms"))$median }); plot(ns, timings)`. I guess, it might be worth to save the "length" as an attribute when creating a "bigz". – alexis_laz Jun 28 '16 at 18:33
  • Rather than putting your answer into your question (and leaving the question looking unresolved), you should just answer your own question. – Gregor Thomas Jun 28 '16 at 18:53
  • @Gregor, I was hesitant about posting as an answer as I wasn't sure if my answer was thorough enough. Anywho, I have taken your suggestion. – Joseph Wood Jun 28 '16 at 18:57
  • It seems to answer your question, so I think it's better as an answer than as a question. If someone comes along with a better answer you can always upvote/accept that one. I believe there's a 48 hour wait period on accepting your own answer anyway. – Gregor Thomas Jun 28 '16 at 19:04

1 Answers1

1

As @alexis_laz pointed out in the comments, gmp::print.bigz already calculates the length but doesn't return it in any usable format. I did some digging into the gmp source code and found this:

print.bigz <- function(x, quote = FALSE, initLine = is.null(modulus(x)), ...)
{
  if((n <- length(x)) > 0) {
    if(initLine) {
      cat("Big Integer ('bigz') ")
      kind <- if(isM <- !is.null(nr <- attr(x, "nrow")))
        sprintf("%d x %d matrix", nr, n/nr)
      else if(n > 1) sprintf("object of length %d", n) else ""
      cat(kind,":\n", sep="")
    }
    print(as.character(x), quote = quote, ...)
  }
  else
    cat("bigz(0)\n")
  invisible(x)
}

As you can see, it uses the cat function to return your bigz object. From this question and this answer, it is possible to retrieve the requested information, however, it isn't nearly as efficient as simply calling length. Below is a very crude function for obtaining the length.

BigZLength <- function(x) {
    b <- capture.output(x)
    a <- strsplit(b[1], split=" ")[[1]][7]
    if (!is.na(a)) {as.integer(substr(a,1,nchar(a)-1))} else {1L}
}

system.time(sapply(1:10^5, function(x) length(BigZ)))
 user  system elapsed 
0.67    0.00    0.67 

system.time(sapply(1:10^5, function(x) BigZLength(BigZ)))
 user  system elapsed 
24.57    0.01   24.71

I'm sure you could write a more efficient function using regular expressions (or something else), however, I don't believe it will be as efficient as simply calling length. In fact, simply getting the output of cat takes most of the time in the above code.

system.time(sapply(1:10^5, function(x) capture.output(BigZ)))
 user  system elapsed 
20.00    0.00   20.03



A note about fetching the source code above

If you are familiar with R you know that you can view the source code of a given function by simply typing the function in the console and printing it like so:

numbers::nextPrime
function (n) 
{
    if (n <= 1) 
        n <- 1
    else n <- floor(n)
    n <- n + 1
    d1 <- max(3, round(log(n)))
    P <- Primes(n, n + d1)
    while (length(P) == 0) {
        n <- n + d1 + 1
        P <- Primes(n, n + d1)
    }
    return(as.numeric(min(P)))
}
<environment: namespace:numbers>

However, sometimes this is not possible. For example with gmp::print.bigz we obtain:

gmp::print.bigz
Error: 'print.bigz' is not an exported object from 'namespace:gmp'

Enter Joshua Ulrich’s awesome question and answer. Using the code he suggests below, you can download the source code of any package and unpack it in one line.

untar(download.packages(pkgs = "gmp",
                        destdir = ".",
                        type = "source")[,2])

This creates a folder in your directory with all of the compiled code. The above source code was found in the .\gmp\R\biginteger.R file.

Community
  • 1
  • 1
Joseph Wood
  • 7,077
  • 2
  • 30
  • 65