I have a problem computing a reduction of a function of a network represented by a large (200000x200000) matrix generated as a distance matrix between pairs of points.
Minimal example, input X a 200000x2 numpy array of cartesian coordinates:
x = tf.constant(X[:,0], shape=[X.shape[0],1])
y = tf.constant(X[:,1], shape=[X.shape[0],1])
dx = x - tf.transpose(x)
dy = y - tf.transpose(y)
D = tf.sqrt(dx*dx + dy*dy)
M = 0.1 * 5.0 / tf.pow(4.0 + D, 1.5)
res = tf.reduce_sum(betaM)
Running on the CPU, the memory (16GB on my MBP) is quickly oversubscribed and the system grinds to a halt. Presumably tf is trying to store the whole of D (and M?) in memory.
If I were writing this in C/C++, I would most likely loop over the matrix rows, summing each row as I go and never storing the whole matrix. Ditto the GPU -- I'd subdivide the (virtual) matrix and perform the reduction in chunks.
Is there a trick to getting tf to follow a more chunk-wise behaviour, economising on memory?
Cheers,
Chris
EDIT:
An alternative approach which copes with the memory issue is to use tf.map_fn
:
rowsums = tf.map_fn(lambda i: tf.reduce_sum(tf.sqrt(tf.reduce_sum(tf.pow(i - x,2),1))) , x)
res = tf.reduce_sum(rowsums)
Thus only the rowsums are stored as a tensor, and not the full distance matrix. However, although this approach works well on the CPU, it grinds to a halt on the GPU.