2

I need to compute the factorial of a variable in Google BigQuery - is there a function for this? I cannot find one in the documentation here:

https://cloud.google.com/bigquery/query-reference#arithmeticoperators

My proposed solution at this point is to compute the factorial for numbers 1 through 100 and upload that as a table and join with that table. If you have something better, please advise.

As context may reveal a best solution, the factorial is used in the context of computing a Poisson probability of a random variable (number of events in a window of time). See the first equation here: https://en.wikipedia.org/wiki/Poisson_distribution

cgnorthcutt
  • 3,890
  • 34
  • 41
  • you can easily use java script udf for this - https://cloud.google.com/bigquery/user-defined-functions. – Mikhail Berlyant Sep 30 '15 at 22:48
  • 1
    Up to 100! will be challenging. BigQUery uses 64 bit integer, which is good up until 20! but 21! already doesn't fit. – Mosha Pasumansky Sep 30 '15 at 23:12
  • Sometimes people use factorials as a building block for something else that doesn't blow up as fast as factorials, such as computing (n choose k) for small k. If you happen to be doing something like that, please add to the question, since you can get a better solution that way. – eubrandt Oct 01 '15 at 21:45
  • @MoshaPasumansky I have solved this for up to any number, not just a hundred. See posting and comment below. – cgnorthcutt Oct 01 '15 at 22:56

4 Answers4

4

Try below. Quick & dirty example

select number, factorial 
FROM js(
// input table
(select number from
(select 4 as number),
(select 6 as number),
(select 12 as number)
),
// input columns
number,
// output schema
"[{name: 'number', type: 'integer'},
{name: 'factorial', type: 'integer'}]",
// function
"function(r, emit){
  function fact(num)
  {
      if(num<0)
       return 0;
      var fact=1;
      for(var i=num;i>1;i--)
        fact*=i;
      return fact;
   }

  var factorial = fact(r.number)

  emit({number: r.number,  factorial: factorial});
}"
)
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • 1
    Close! Now plug in 25 for 12 in your example and you'll see it fails. Also we want to compute this for all numbers, and we don't want to have to type out all the numbers over and over. Using your code as a backbone, I've got this working. As before, I've added it as an answer - see what you think! – cgnorthcutt Oct 01 '15 at 22:58
  • I think, it is an art to provide answer in such a way that you can make some few final moves by yourself to make it works for exactly your case. I am glad you got this work :) Still that select row_numbers ... is not quite generic - you can improve it to eliminate any table use at all while still generating as many numbers as you need. let me know if you want to see this – Mikhail Berlyant Oct 02 '15 at 00:55
  • Handling large numbers across platforms can be tricky. The trick is string exponential, then back to float. Sure, that would be enlightening! – cgnorthcutt Oct 02 '15 at 01:43
2

If the direct approach works for the values you need the Poisson distribution calculated on, then cool. If you reach the point where it blows up or gives you inaccurate results, then read on for numerical analysis fun times.

In general you'll get better range and numerical stability if you do the arithmetic on the logarithms, and then exp() as the final operation.

  1. You want: c^k / k! exp(-c).
  2. Compute its log, ln( c^k / k! exp(-c) ),
    i.e. k ln(x) - ln(k!) - c
  3. Take exp() of that.

Well but how do we get ln(k!) without computing k!? There's a function called the gamma function, whose practical point here is that its logarithm gammaln() can be approximated directly, and ln(k!) = gammaln(k+1).

There is a Javascript gammaln() in Phil Mainwaring's answer here, which I have not tested, but assuming it works it should fit into a UDF for you.

Community
  • 1
  • 1
eubrandt
  • 239
  • 1
  • 4
0

Extending Mikhail's answer to be general and correct for computing the factorial for all number 1 to n, where n < 500, the following solution holds and can be computed efficiently:

select number, factorial 
FROM js(
// input table
(
  SELECT
    ROW_NUMBER() OVER() AS number, 
    some_thing_from_the_table
  FROM
    [any table with at least LIMIT many entries]
  LIMIT
    100 #Change this to any number to compute factorials from 1 to this number
),
// input columns
number,
// output schema
"[{name: 'number', type: 'integer'},
{name: 'factorial', type: 'float'}]",
// function
"function(r, emit){
  function fact(num)
  {
      if(num<0)
       return 0;
      var fact=1;
      for(var i=num;i>1;i--)
        fact*=i;
      return fact;
   }

  #Use toExponential and parseFloat to handle large integers in both Javascript and BigQuery
  emit({number: r.number,  factorial: parseFloat(fact(r.number).toExponential())});
}"
)
cgnorthcutt
  • 3,890
  • 34
  • 41
  • But now try n = 500. (Javascript floats are IEEE double, so they range up to about 2^1024. 500! is about 2^3767.) – eubrandt Oct 07 '15 at 22:27
0

You can get up to 27! using SQL UDF. Above that value NUMERIC type gets overflow error.

CREATE OR REPLACE FUNCTION factorial(integer_expr INT64) AS ( (
    SELECT
      ARRAY<numeric>[ 
      1,
      1,
      2,
      6,
      24,
      120,
      720,
      5040,
      40320,
      362880,
      3628800,
      39916800,
      479001600,
      6227020800,
      87178291200,
      1307674368000,
      20922789888000,
      355687428096000,
      6402373705728000,
      121645100408832000,
      2432902008176640000,
      51090942171709440000.,
      1124000727777607680000.,
      25852016738884976640000.,
      620448401733239439360000.,
      15511210043330985984000000.,
      403291461126605635584000000.,
      10888869450418352160768000000.][
    OFFSET
      (integer_expr)] AS val ) );

  select factorial(10);
Anna
  • 49
  • 7