0

I'm interested in the implementation of a function mvncdf, (http://cn.mathworks.com/help/stats/mvncdf.html).

Specifically, I want to know its implementation of passing an array and returning an array. I want to know if it's done in a efficient way. Because I'm using numpy, and there is no counterpart in numpy with this.

ZK Zhao
  • 19,885
  • 47
  • 132
  • 206
  • Aside: though the interface isn't nice, you can use `mvn.mvnun` in scipy, see [here](http://stackoverflow.com/questions/30560176/multivariate-normal-cdf-in-scipy). – DSM Feb 07 '16 at 03:07
  • 1
    In what sense might the array passing be efficient, more or less so, in either language? Is that something distinct from the calculation method? – hpaulj Feb 07 '16 at 05:19
  • @DSM, thanks, I can't believe that I replied to that post several days ago. – ZK Zhao Feb 07 '16 at 08:08
  • @DSM, I know what's `wrong` with `mvn.mvnun` with `scipy` now, It's not a `cdf` in our common sense. The `cdf` we talk about don't invovle the `lower bound` at all, because it all come from the `-inf`. – ZK Zhao Feb 08 '16 at 01:21

2 Answers2

7

In MATLAB, type edit mvncdf.m. (This does not work for all functions, as they may be only distributed in compiled form.) If you do not have MATLAB, the octave implementation is available here.

zeeMonkeez
  • 5,057
  • 3
  • 33
  • 56
4

The Octave code can be seen at:

http://sourceforge.net/p/octave/statistics/ci/default/tree/inst/mvncdf.m

Why should there be anything special how it passes arrays, in or out, in either language? The Octave code parses a flexible set of inputs. The calculation work appears to be iterative. The time spent in calculations probably exceeds any interface work.


So the Octave version takes in some mix of

x,mu,sigma, a 

assigning defaults to the missing parameters. cases is the first dimension of x.

The meat of the function appears to be:

  c = chol (sigma)';
  p = zeros (cases, 1);       # array that it returns
  varsum = zeros (cases, 1);

  err = ones (cases, 1) .* err_eps;   # the other return
  # Apply crude Monte-Carlo estimation
  while any (err >= err_eps)
    # Sample from q-1 dimensional unit hypercube
    w = rand (cases, q - 1);

    # Transformation of the multivariate normal integral
    dvev = normcdf ([a(:, 1) / c(1, 1), x(:, 1) / c(1, 1)]);
    dv = dvev(:, 1);
    ev = dvev(:, 2);
    fv = ev - dv;
    y = zeros (cases, q - 1);
    for i = 1:(q - 1)
      y(:, i) = norminv (dv + w(:, i) .* (ev - dv));
      dvev = normcdf ([(a(:, i + 1) - c(i + 1, 1:i) .* y(:, 1:i)) ./ c(i + 1, i + 1), (x(:, i + 1) - c(i + 1, 1:i) .* y(:, 1:i)) ./ c(i + 1, i + 1)]);
      dv = dvev(:, 1);
      ev = dvev(:, 2);
      fv = (ev - dv) .* fv;
    endfor

    n++;
    # Estimate standard error
    varsum += (n - 1) .* ((fv - p) .^ 2) ./ n;
    err = gamma .* sqrt (varsum ./ (n .* (n - 1)));
    p += (fv - p) ./ n;
  endwhile

In Python terms it would end with return p, err. With the scipy equivalents of norminv and normcdf, translation to numpy should be straight forward. I don't see any efficiency issues dealing with implementation of passing an array and returning an array. p has the same number of rows as x, but otherwise is not a copy or view of x.

hpaulj
  • 221,503
  • 14
  • 230
  • 353