2

I'm trying to implement the binarizer in page 4 of this paper. It's not too difficult of a function. It's simply this:

enter image description here

No gradients to be backpropagated for this function. I'm trying to do it in TensorFlow. There are two ways to go about it:

  1. Implementing it in C++ using TensorFlow. However, the instructions are quite unclear to me. It would be great if someone could walk me through it. One thing that I was unclear was why is the gradient for ZeroOutOp implemented in Python?
  2. I decided to go with the pure Python approach.

Here's the code:

import tensorflow as tf
import numpy as np

def py_func(func, inp, out_type, grad):
    grad_name = "BinarizerGradients_Schin"
    tf.RegisterGradient(grad_name)(grad)
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": grad_name}):
        return tf.py_func(func, inp, out_type)

'''
This is a hackish implementation to speed things up. Doesn't directly follow the formula.
'''
def _binarizer(x):
    probability_matrix = (x + 1) / float(2)
    probability_matrix = np.matrix.round(probability_matrix, decimals=0)
    np.putmask(probability_matrix, probability_matrix==0.0, -1.0)
    return probability_matrix

def binarizer(x):
    return py_func(_binarizer, [x], [tf.float32], _BinarizerNoOp)

def _BinarizerNoOp(op, grad):
    return grad

The problem happens here. Inputs are 32x32x3 CIFAR images and they get reduced to 4x4x64 in the last layer. My last layer has a shape of (?, 4, 4, 64), where ? is the batch size. After putting it through this by calling:

binarized = binarizer.binarizer(h_pool3)
h_deconv1 = tf.nn.conv2d_transpose(h_pool3, W_deconv1, output_shape=[batch_size, img_height/4, img_width/4, 64], strides=[1,2,2,1], padding='SAME') + b_deconv1

The following error occurs:

ValueError: Shapes (4, 4, 64) and (?, 4, 4, 64) are not compatible

I can kinda guess why this happens. The ? represents the batch size and after putting the last layer through the binarizer, the ? dimension seems to disappear.

jkschin
  • 5,776
  • 6
  • 35
  • 62
  • Have you found a solution? – fabian789 Oct 28 '16 at 07:22
  • @fabian789 nope. Haven't been working on it recently. But really looking forward to someone pointing out some intermediate difficulty tutorials on implementing ops (apart from the one on the TF website). – jkschin Oct 29 '16 at 03:47

1 Answers1

1

I think you can proceed as described in this answer. Applied to our problem:

def binarizer(input):
    prob = tf.truediv(tf.add(1.0, input), 2.0)
    bernoulli = tf.contrib.distributions.Bernoulli(p=prob, dtype=tf.float32)
    return 2 * bernoulli.sample() - 1

Then, where you setup your network:

W_h1, bias_h1 = ...
h1_before_bin = tf.nn.tanh(tf.matmul(x, W_h1) + bias_h1)

# The interesting bits:
t = tf.identity(h1_before_bin)
h1 = t + tf.stop_gradient(binarizer(h1_before_bin) - t)

However, I'm not sure how to verify that this works...

Community
  • 1
  • 1
fabian789
  • 8,348
  • 4
  • 45
  • 91