Calculating euclidian distance in batches with TensorFlow

Question

import tensorflow as tf
import numpy as np
dim = 1000
x1 = tf.placeholder('float32', shape=(None, dim))
x2 = tf.placeholder('float32', shape=(None, dim))
l2diff = tf.sqrt( tf.reduce_sum(tf.square(tf.sub(x1, x2)),reduction_indices=1))

vector1  = np.random.rand(1,1000)
all_vectors = np.random.rand(500,1000)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
distances = sess.run(l2diff, feed_dict = {x1: vector1, x2: all_vectors})

The above code works well. But iterating for each vector takes too much time. So is there any way to calculate the same with multiple vectors at a time. Like lets say vector1 = np.random.rand(10,1000) I am preferring this than sklearn's euclidian distance because I want to calculate similarity for 100k vectors and want to run it on GPU.

And also don't want to replicate the all_vectors because all_vactors already fills 70% my machine's RAM.

Is there any way to calculate distances by passing batch of vectors?

Do you need exact euclidian distance or just a set of vectors ordered by distance from `vector1`? If it is the second one, removing the `sqrt` operation will significantly speed up your algorithm and order will be preserved (if `a > b` then `a^2 > b^2` for all positive `a,b`). Then you can take the sqrt on only the vectors you need at the end — jbird, Mar 23 '17 at 14:46
http://stackoverflow.com/questions/37009647/compute-pairwise-distance-in-a-batch-without-replicating-tensor-in-tensorflow — Yaroslav Bulatov, Mar 23 '17 at 16:03

Calculating euclidian distance in batches with TensorFlow

0 Answers0