41

I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by tiling the original tensor :

def pairwise_l2_norm2(x, y, scope=None):
    with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
        size_x = tf.shape(x)[0]
        size_y = tf.shape(y)[0]
        xx = tf.expand_dims(x, -1)
        xx = tf.tile(xx, tf.pack([1, 1, size_y]))

        yy = tf.expand_dims(y, -1)
        yy = tf.tile(yy, tf.pack([1, 1, size_x]))
        yy = tf.transpose(yy, perm=[2, 1, 0])

        diff = tf.sub(xx, yy)
        square_diff = tf.square(diff)

        square_dist = tf.reduce_sum(square_diff, 1)

        return square_dist

This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.

The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory. I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.

jrabary
  • 2,411
  • 1
  • 28
  • 50
  • It's not clear that your code is doing 'pairwise distance of a batch of feature`. Can you specify the function you want to do more formally? Also, have you considered [tf.squared_difference](https://www.tensorflow.org/versions/r0.8/api_docs/python/math_ops.html#squared_difference) – keveman May 03 '16 at 22:58
  • I update the question to explain this. If you put a batch of features as input of this function it should compute distance between its rows. – jrabary May 04 '16 at 01:20

4 Answers4

74

You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and

D[i,j] = (a[i]-a[j])(a[i]-a[j])'

You can rewrite that into

D[i,j] = r[i] - 2 a[i]a[j]' + r[j]

Where r[i] is squared norm of ith row of the original matrix.

In a system that supports standard broadcasting rules you can treat r as a column vector and write D as

D = r - 2 A A' + r'

In TensorFlow you could write this as

A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)

# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)

result

array([[0, 2, 8],
       [2, 0, 2],
       [8, 2, 0]], dtype=int32)
Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • Thank you. I better understand why broadcasting is interesting. – jrabary May 05 '16 at 09:08
  • Do you know if this approach is better than using `tf.expand_dims` to exploit broadcasting and then use `tf.squared_difference`? – Yamaneko Oct 14 '16 at 12:02
  • Not sure. There's a `transpose` in my solution which is costly. If you post a full solution as another answer I could compare performance on large matrix – Yaroslav Bulatov Oct 14 '16 at 19:20
  • 2
    I don't know how much of a performance improvement this provides, but `tf.matmul` has arguments for transposing arrays on the fly (`transpose_a` and `transpose_b`). – David J. Harris May 02 '17 at 17:44
  • 3
    ^ I recently tested this (on GPU, with TF 2.0). `tf.matmul(a, b, transpose_b=True)` was consistently ~40% faster than `tf.matmul(a, tf.transpose(b))`. – RGMyr Nov 21 '19 at 17:11
  • Is there a simple way to extend this to work for pairwise distances between two sets of points? D_ij = |X[i] - Y[j]|. @Yamaneko's answer can be easily adapted but I don't know how this one can. – fizzybear Jan 08 '20 at 19:59
  • 1
    I think it should be noted that this produces the square of the distance and not the distance itself. A simple sqrt at the end gets you the actual distance. I only mention that because of the title of the original Q. – Andrew White May 04 '21 at 11:52
22

Using squared_difference:

def squared_dist(A): 
    expanded_a = tf.expand_dims(A, 1)
    expanded_b = tf.expand_dims(A, 0)
    distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
    return distances

One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by @YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).

Yamaneko
  • 3,433
  • 2
  • 38
  • 57
  • 3
    thanks for the information that the other solution is less memory intensive. good to know that. +1 for the great answer – lhk Nov 16 '16 at 12:45
  • 2
    This solution is also less compute efficient. But it is very useful when there is no possibility to use matrix multiplication (e.g. for absolute distance) – Evgeny Savinov Dec 18 '17 at 17:53
11

Here is a more general solution for two tensors of coordinates A and B:

def squared_dist(A, B):
  assert A.shape.as_list() == B.shape.as_list()

  row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
  row_norms_A = tf.reshape(row_norms_A, [-1, 1])  # Column vector.

  row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
  row_norms_B = tf.reshape(row_norms_B, [1, -1])  # Row vector.

  return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B

Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).

Augustin
  • 2,444
  • 23
  • 24
1

If you want compute other method , then change the order of the tf modules.

def compute_euclidean_distance(x, y):
    size_x = x.shape.dims[0]
    size_y = y.shape.dims[0]
    for i in range(size_x):
        tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
        eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
        if i == 0:
            d = eu_one
        else:
            d = tf.concat([d, eu_one], axis=0)
return d