R Programming Efficiency - factorial decomposition of a number

Question

I am trying to solve a kata from codewars called factorial decomposition in R. The aim of the kata is to decompose n! (factorial n) into its prime factors. The function should return a string like

decomp(12) -> "2^10 * 3^5 * 5^2 * 7 * 11"

I was able to solve it, but reached a server timeout (passing 74 assignments). I triet to optimize it a bit (lapply of pointwise()), but the essential core (for-while-loop) I was not able to alter.

Any help would be appreciated, as I put already more time into it than I should have.

##' A function for a factorial decomposition of a number
##' @title decomp
##' @param n integer
##' @return a String with the factorial decomposition
##' @author krisselack

decomp <- function(n) {

  # https://stackoverflow.com/questions/19767408/prime-number-function-in-r
  is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)

  p <- 2:n
  primes <- p[as.logical(vapply(p, is.prime, 1))]
  erg <- NULL

  pointwise <- function(x) {

    primloop <- primes[primes<=x]

    for(j in primloop){

      while(x %% j == 0){
        x <- x/j
        erg <- c(erg, j)
      }
    }

    if(length(erg)>0)
      return(erg)
  }

  erg2 <- unlist(lapply(p, pointwise))

  ergfin <- table(erg2)

  namen <- paste(ifelse(ergfin>1, paste0(names(ergfin), "^", ergfin),
                        paste(names(ergfin))),
                 collapse = " * ")

  return(namen)
}

decomp(5) # -> "2^3 * 3 * 5"
decomp(12) # -> "2^10 * 3^5 * 5^2 * 7 * 11"
decomp(17) # -> "2^15 * 3^6 * 5^3 * 7^2 * 11 * 13 * 17"
decomp(25) # -> "2^22 * 3^10 * 5^6 * 7^3 * 11^2 * 13 * 17 * 19 * 23"

FYI `decomp(1000)` runs instantly on my laptop and `decomp(10000)` took less than 2 seconds. Do you know what order of magnitude made it timeout? — asachet, Aug 29 '19 at 09:00
Thanks. With the other katas there were around 100-120 tests. I passed 65-75 tests. Server time max 12 secs. — Krisselack, Aug 29 '19 at 09:03
At least to me it is hidden. Here is the link to the kata: https://www.codewars.com/kata/5a045fee46d843effa000070 — Krisselack, Aug 29 '19 at 09:05
You could probably improve performance by doing it the other way around: loop on the primes between 2 and n and figure out how many factors they have in the factorial. You'd have to handle multiplicity though. — asachet, Aug 29 '19 at 09:06
This is how I started out. The factorial gets too big though. For instance, modulo is not working properly for larger floats. — Krisselack, Aug 29 '19 at 09:08
You don't have to compute the factorial to compute how many `2`'s or `3`'s or `p`'s there are in the decomp of `n!` — asachet, Aug 29 '19 at 09:11
Tried that, then I hit the constraint of Max Buffer Size Reached (1.5 MiB). pointwise <- function(x) { primloop <- p[p>=x] for(j in primloop){ while(j %% x == 0){ j <- j/x erg <- c(erg, x) } } if(length(erg)>0) return(erg) } erg2 <- unlist(lapply(primes, pointwise)) — Krisselack, Aug 29 '19 at 09:23
Note that another bottleneck for high `n` is the computation of the primes. You could use a sieve of Eratosthenes instead. — asachet, Aug 29 '19 at 09:29
Nope, found a version here: https://gist.github.com/seankross/5946396. Same. — Krisselack, Aug 29 '19 at 09:38
Not the same, the principle of the sieve is that you loop only from the previous prime, which is much faster than considering all possible prime factors at every step. You are doing trial division, not the sieve. There is a good explanation on wikipedia. FYI I have added a benchmark in my answer. — asachet, Aug 29 '19 at 09:42

asachet · Accepted Answer · 2019-08-29T09:47:10.840

library("purrr")

# https://stackoverflow.com/questions/19767408/prime-number-function-in-r
is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)

#' Multiplicity of prime p in the decomp of n!
#' @param p A prime
#' @param n An integer
multiplicity_in_factorial <- function(p, n) {
  # Adding epsilon to avoid rounding errors.
  # For example at p = 3, n = 243
  max_mul <- floor(log(n) / log(p) + 0.0001)
  prime_mul <- p ^ (1:max_mul)
  how_many_of_each <- map_dbl(prime_mul, ~ floor(n / .))
  sum(how_many_of_each)
}



decomp2 <- function(n) {
  p <- 2:n
  primes <- p[as.logical(vapply(p, is.prime, 1))]

  primes_mul <- map_dbl(primes, multiplicity_in_factorial, n)

  namen <- paste(ifelse(primes_mul > 1,
                        paste0(primes, "^", primes_mul),
                        primes),
                 collapse = " * ")

  return(namen)
}

check <- function(n) {
  decomp(n) == decomp2(n)
}

The idea is to loop on primes below n and figure out how often they appear in the factorial.

The key is that the multiplicity of p in n! is the sum of the multiplicities of p in k for k = 1..n.

To illustrate, n = 100 and p = 2. There are 50 multiples of 2 between 1 and 100. But this does not account for factors with multiplicity > 1.

We also have to consider the multiples of 4 (there are 25), of 8 (there are 12), of 16 (there are 6), of 32 (there are 3) and of 64 (there is 1).

This is what happens in multiplicity in factorial. The rest is straightforward.

The bottleneck on high values is the computation of primes which could be improved by using the sieve of Eratosthenes.

# https://gist.github.com/seankross/5946396
microbenchmark::microbenchmark(
   sieve = sieveOfEratosthenes(N),
   naive_filter = {
     p <- 2:N
     primes <- p[as.logical(vapply(p, is.prime, 1))]
   }
 )

Unit: microseconds
          expr      min        lq      mean    median       uq      max neval
         sieve  395.010  405.4015  423.2184  410.8445  439.629   584.71   100
  naive_filter 2875.782 2936.5195 3268.4576 2979.4925 3016.060 16875.81   100

But I wouldn't bother: the real bottleneck will be the string pasting which is notoriously slow.

On my laptop decomp2(10e5) takes a few seconds and decomp2(10e6) takes around 2 minutes. I am 99% certain the string pasting is actually the bottleneck in that case.

So, now, copy-pasting your solution: Time: 6629ms Passed: 90 Failed: 0 . It was 90 tests and I passed ~75. Thanks a lot. I did not use purrr before. — Krisselack, Aug 29 '19 at 09:46
You don't have to use`purrr`: `vapply` is really the same. To be fair, your code was efficient at running your algorithm. This is not really a code improvement, it is another algorithm! — asachet, Aug 29 '19 at 09:57
My program considered only primes smaller than the current number, but did not account for the factorizability of 1:n itself. I shamefully harvested the karma, but learned a lot with this exercise. Thank you for the fast and proficient help! — Krisselack, Aug 29 '19 at 10:03

score 1 · Answer 2 · answered Aug 29 '19 at 16:37

You can use package {primefactr} to do that:

> primefactr::ReducePrime(c(rep(0, 999), 1), out.summary = TRUE)
       [,1] [,2]
primes    2    5
power     3    3
> primefactr::ReducePrime(c(rep(0, 9999), 1), out.summary = TRUE)
       [,1] [,2]
primes    2    5
power     4    4
> primefactr::ReducePrime(c(rep(0, 999999), 1), out.summary = TRUE)
       [,1] [,2]
primes    2    5
power     6    6

R Programming Efficiency - factorial decomposition of a number

2 Answers2