2

I am trying to correlate two matrices column wise. i.e. correlate the 1st column of the 1st matrix with the 1st column of the 2nd matrix and so on. In numpy I do:

np.corrcoef(x, y, axis=0)

And it works great. What would be the Tensorflow equivalent of that command?

I tried using streaming_pearson_correlation1 but that correlates all the columns together instead of providing a result per column.

As a last resort I'm considering splitting the tensor into separate column tensors, but I'm guessing this will have a performance cost.

I know that I can wrap numpy in a py_func, but then it won't run on a GPU.

Thanks in advance for the help.

Dave R
  • 21
  • 3

1 Answers1

5

Documentation page for numpy corrcoef gives connection between corcoef and covariance matrix. So, natural thing is to rewrite it in terms of matmuls in numpy first:

fsize=1
dsize=3
x=np.random.random((fsize,dsize))
y=np.random.random((fsize,dsize))
xy=np.concatenate([x,y], axis=0)
(np.corrcoef(xy) == np.corrcoef(x,y)).all()
mean = np.mean(xy, axis=1, keepdims=True)
cov = ((xy-mean) @ (xy-mean).T)/(dsize-1)
cov2 = np.diag(1/sqrt(np.diag(cov)))
np.testing.assert_allclose(cov2@cov@cov2, np.corrcoef(x, y))

Now convert to TensorFlow, and check that result is the same

def t(x): return tf.transpose(x)
sess = tf.InteractiveSession()

x_t = tf.constant(x)
y_t = tf.constant(y)
xy_t = tf.concat([x, y], axis=0)
mean_t = tf.reduce_mean(xy_t, axis=1, keep_dims=True)
cov_t = ((xy_t-mean_t) @ t(xy_t-mean_t))/(dsize-1)
cov2_t = tf.diag(1/tf.sqrt(tf.diag_part(cov_t)))
cor = cov2_t @ cov_t @ cov2_t

np.testing.assert_allclose(np.corrcoef(x, y), cor.eval())

Correlations between variables that constitute x and y are in off-diagonal blocks of this matrix.

Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197