I'm trying to implement the binarizer in page 4 of this paper. It's not too difficult of a function. It's simply this:
No gradients to be backpropagated for this function. I'm trying to do it in TensorFlow. There are two ways to go about it:
- Implementing it in C++ using TensorFlow. However, the instructions are quite unclear to me. It would be great if someone could walk me through it. One thing that I was unclear was why is the gradient for ZeroOutOp implemented in Python?
- I decided to go with the pure Python approach.
Here's the code:
import tensorflow as tf
import numpy as np
def py_func(func, inp, out_type, grad):
grad_name = "BinarizerGradients_Schin"
tf.RegisterGradient(grad_name)(grad)
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": grad_name}):
return tf.py_func(func, inp, out_type)
'''
This is a hackish implementation to speed things up. Doesn't directly follow the formula.
'''
def _binarizer(x):
probability_matrix = (x + 1) / float(2)
probability_matrix = np.matrix.round(probability_matrix, decimals=0)
np.putmask(probability_matrix, probability_matrix==0.0, -1.0)
return probability_matrix
def binarizer(x):
return py_func(_binarizer, [x], [tf.float32], _BinarizerNoOp)
def _BinarizerNoOp(op, grad):
return grad
The problem happens here. Inputs are 32x32x3 CIFAR images and they get reduced to 4x4x64 in the last layer. My last layer has a shape of (?, 4, 4, 64), where ? is the batch size. After putting it through this by calling:
binarized = binarizer.binarizer(h_pool3)
h_deconv1 = tf.nn.conv2d_transpose(h_pool3, W_deconv1, output_shape=[batch_size, img_height/4, img_width/4, 64], strides=[1,2,2,1], padding='SAME') + b_deconv1
The following error occurs:
ValueError: Shapes (4, 4, 64) and (?, 4, 4, 64) are not compatible
I can kinda guess why this happens. The ? represents the batch size and after putting the last layer through the binarizer, the ? dimension seems to disappear.