1

I am having a trouble turning 2 uncorrelated varialbes to correlated variables without using transformation method (like Cholesky method).

I have 2 original variables, say Original1 and Original2. The number of data points is 50 and the correlation between the two variables is -0.95.

Then I fit the variables into empirical distributions and generate 10,000 random numbers for both variables, say Random1 and Random2. The correlation betweten the two is close to 0.

Then I use this algorithm by F. Jatpil in https://stats.stackexchange.com/questions/38856/how-to-generate-correlated-random-numbers-given-means-variances-and-degree-of

The Algorithm in python is like this:

while True:
    n1 = random.randrange(0, 10000 - 1)
    a = Random1.iloc[n1] - Random1_Average
    b = Random2.iloc[n1] - Random2_Average

    while True:
        n2 = random.choice(list(range(0, n1)) + list(range(n1 + 1, 9999)))
        c = Random1.iloc[n2] - Random1_Average
        d = Random2.iloc[n2] - Random2_Average

        if (a - c) * (b - d) > 0:
            break
        else:
            continue
    Random1.iloc[n1], Random1.iloc[n2] = Random1.iloc[n2], Random1.iloc[n1]
    c_new = np.corrcoef(Random1, Random2)[0][1]
    if r_lowerlimit <= c_new and c_new <= r_upperlimit:
        break
    else:
        continue

Here, the reason why I check for (a - c) * (b - d) > 0 is because it is the needed condition for the correlation to decrease after swapping n1 and n2. After swapping, then I check if the new correlation is between certain lower limit and upper limit (-0.90 and 1.00, in this case). If the new correlation falls in that range, then we now have Random1 that is correlated with Random2 with similar correlation as the original.

There are 2 problems with this code right now:

  1. As you can see, this brutal force method takes a long time.
  2. Most times it works under 1 minute. Sometimes, however, it seems that as the correlation gets close to the limit, it never gets out of the inner while loop because it has a hard time finding a good n2 value to swap.

What would be a good method to fix this problem? Thanks.

JungleDiff
  • 3,221
  • 10
  • 33
  • 57
  • It's a pretty inefficient method, compounded by the fact that you're computing a new correlation coefficient every time you do a swap (and generating a length-9999 list every time you sample `n2` to boot). Do you know the distribution of the original variables? – David Eisenstat Sep 11 '20 at 20:28
  • @DavidEisenstat Yes I know the distribution of the original variables. – JungleDiff Sep 12 '20 at 21:31
  • What is that distribution? – David Eisenstat Sep 12 '20 at 23:07
  • It's one of the distributions in this list that has the lowest SSE: https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python. – JungleDiff Sep 14 '20 at 00:30

0 Answers0