I want to apply linear activation to most of my Keras model's output layer, and sigmoid activation to a set of "columns" which are interleaved with the other data in the tensor.
Following this post on writing custom activations and @jdehesa's answer in this post on sliced assignment, and this other post about sliced assignment, I wrote the following:
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
import tensorflow as tf
def selective_activation(x, start=0, end=None, skip_every=6):
with tf.control_dependencies(x[:,start:end:skip_every].assign(K.sigmoid(x[:,start:end:skip_every]))):
x = tf.identity(x)
return x
model = Sequential()
model.add(...bunch of layers...)
model.add(Dense(),name="Final Layer")
get_custom_objects().update({'selective_activation': Activation(selective_activation)})
model.add(Activation(selective_activation))
...
When I run this I get the error "ValueError: Sliced assignment is only supported for variables
" on the line with the tf.control_dependencies
context. I'm confused: how is my Keras layer output NOT a Variable?
Can someone suggest a way to implement the sort of assignment I'm trying to do?
I'm only imagining three solutions:
- My currently-implemented workaround is to create two different output layers using the functional API, give each its own activation, then concatenate them together, and then multiply by a 'permutation matrix' (a bunch of 0's and 1's) to reorder the columns so that they end up where the rest of the code is expecting variables to be (i.e. interleaved with the other linearly-activate variables). But this seems like an overly complex, verbose hack. (No need to submit an answer implementing this; I've already got it but I don't like it.)
Cook something up with tf.scatter_nd() or tf.scatter_update()...somehow?
The other option I can think of, i.e. rewriting everything else in the rest of the code to keep the 'existence' variables bunched together instead of interleaved with the other variables...that would be a lot of work I'm not eager to embark on.
(This is for an object detector by the way, which previously was using MSE loss for all variables, and now I want to have cross-entropy loss for the 'does an object exist' category.)