9

Is there a statistical difference between generating a series of paths for a montecarlo simulation using the following two methods (note that by path I mean a vector of 350 points, normally distributed):

A)

for path = 1:300000
    Zn(path, :) = randn(1, 350); 
end

or the far more efficient B)

Zn = randn(300000, 350);

I just want to be sure there is no funny added correlation or dependence between the rows in method B that isn't present in method A. Like maybe method B distributes normally over 2 dimensions where A is over 1 dimension, so maybe that makes the two statistically different?

If there is a difference then I need to know the same for uniform distributions (i.e. rand instead of randn)

bla
  • 25,846
  • 10
  • 70
  • 101
Dan
  • 45,079
  • 17
  • 88
  • 157
  • 1
    have you considered to do hypothesis testing yourself? – moooeeeep Jan 30 '13 at 07:21
  • What would you suggest? Chi squared test for each row pair? – Dan Jan 30 '13 at 07:23
  • what's your hypothesis? *funny added correlation or dependence between the rows* is hard to quantify, but maybe test for correlation.... – thang Jan 30 '13 at 07:26
  • Yeah I'd be more worried about dependence I guess. Basically what I'm asking is if there are any statistical differences that I might be unaware so I don't really have a hypothesis at the moment :/ I'll try corrcoef ob the row pairs though, see what happens. – Dan Jan 30 '13 at 07:30
  • yeah i think that the situation isn't as simple as it seems. i hope that the default parameters are good enough in terms of generating independent random numbers. you can of course pick your own method, but many pseudo random number generators do not generate a sequence as if drawn from an iid random process. the seed also matters. probably the best is to find a paper on this... where someone has spent a lot of time studying it. – thang Jan 30 '13 at 07:39

2 Answers2

6

Just to add to the answer of @natan (+1), run the following code:

%# Store the seed
Rng1 = rng;

%# Get a matrix of random numbers
X = rand(3, 3);

%# Restore the seed
rng(Rng1);

%# Get a matrix of random numbers one vector at a time
Y = nan(3, 3);
for n = 1:3
    Y(:, n) = rand(3, 1);
end

%# Test for differences
if any(any(X - Y ~= 0)); disp('Error'); end;

You'll note that there is no difference between X and Y. That is, there is no difference between building a matrix in one step, and building a matrix from a sequence of vectors.

However, there is a difference between my code and yours. Note I am populating the matrix by columns, not rows, since when rand is used to construct a matrix in one step, it populates by column. By the way, I'm not sure if you realize, but as a general rule you should always try and perform vector operations on the columns of matrices, not the rows. I explained why in a response to a question on SO the other day; see here for more...

Regarding the question of independence/dependence, one needs to be careful with the language one uses. The sequence of numbers generated by rand are perfectly dependent. For the vast majority of statistical tests, they will appear to be independent - nonetheless, in theory, one could construct a statistical test that would demonstrate the dependency between a sequence of numbers generated by rand.

Final thought, if you have a copy of Greene's "Econometric Analysis", he gives a neat discussion of random number generation in section 17.2.

Community
  • 1
  • 1
Colin T Bowers
  • 18,106
  • 8
  • 61
  • 89
  • arg - I can't this because matlab 2010a is currently installed but I see what you're saying. Hopefully I'll get a chance to upgrade later this week to test it for myself but for now I trust you that X and Y will be identical. – Dan Jan 30 '13 at 08:04
  • As for whether stuff is perfectly dependent or not, I'm not so concerned with that. In fact I'd prefer to use Quasi random numbers (which would fail most of the independence tests that the pseudo random numbers pass) but they're more hassle than it's worth to port to C# which I'll have to do. I just needed to make sure that those two methods were the same in matlab. Thanks. – Dan Jan 30 '13 at 08:06
  • @Dan Glad it helped. You sound like you're on top of things, but just in case you haven't heard of them, here is a link to some info on [Halton sequences](http://en.wikipedia.org/wiki/Halton_sequence) which may prove interesting to you. Cheers. – Colin T Bowers Jan 30 '13 at 08:08
1

As far as the base R's random number generator is concerned, also, there doesn't appear to be any difference between generating a sequence of random numbers at once or doing it one-by one. Thus, @Colin T Bowers' (+1) suggested behavior above also holds in R. Below is an R version of Colin's code:

#set seed
set.seed(1234)
# generate a sequence of 10,000 random numbers at once 
X<-rnorm(10000)
# reset the seed
set.seed(1234)
# create a vector of 10,000 zeros
Y<-rep(0,times=10000)
# generate a sequence of 10,000 random numbers, one at a time
for (i in 1:10000){
Y[i]<-rnorm(1)
}
# Test for differences
if(any(X-Y!=0)){print("Error")}