1

The equation that I'm working with is as follows-

enter image description here

The description says that x-bar and y-bar are the average of array 1 and array 2. The minimum coefficient is 0.3.

The reason I'm asking is because I am not too familiar with reading statistical equations, let alone implementing them in Python...

Ravaal
  • 3,233
  • 6
  • 39
  • 66
  • Take a look here: https://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python – Maarten-vd-Sande Jun 06 '18 at 14:49
  • Or would you like to implement it yourself? – Maarten-vd-Sande Jun 06 '18 at 14:50
  • Is this the same equation as the pearson correlation? If it is then there's no need to implement it myself. If you provide some code as an answer I'll test it and mark your response as the answer. – Ravaal Jun 06 '18 at 14:55

2 Answers2

3

Easiest would be to use scipy.stats (see here)

import numpy as np
from scipy.stats.stats import pearsonr

x = np.random.random(20)
y = np.random.random(20)

print(pearsonr(x, y))

This will give you two values, the correlation and the p-value.

You can implement it yourself like this:

x = np.random.random(20)
y = np.random.random(20)
x_bar = np.mean(x)
y_bar = np.mean(y)

top = np.sum((x - x_bar) * (y - y_bar))
bot = np.sqrt(np.sum(np.power(x - x_bar, 2)) * np.sum(np.power(y - y_bar, 2)))

print(top/bot)

Both give the same result, good luck!

Maarten-vd-Sande
  • 3,413
  • 10
  • 27
1

The straightforward implementation using for loops would be:

import math

def correlation(x, y):
    x_bar = sum(x) / len(x)
    y_bar = sum(y) / len(y)

    var_x = sum((x_i - x_bar)**2 for x_i in x)
    var_y = sum((y_i - y_bar)**2 for y_i in y)

    assert len(x) == len(y)
    numerator = sum((x_i - x_bar) * (y_i - y_bar) for x_i, y_i in zip(x, y))
    denominator = math.sqrt(var_x * var_y)
    return numerator / denominator

if __name__ == "__main__":
    x = [...]
    y = [...]
    print(correlation(x, y))

When doing a lot of numeric calculations one usually uses the numpy module where this function is already defined:

import numpy as np

if __name__ == "__main__":
    x = np.array([...])
    y = np.array([...])
    print(np.corrcoef(x, y)[0, 1])
Graipher
  • 6,891
  • 27
  • 47