To be precise, the loss function that I'm looking for is the squared error when the absolute error is lesser than 0.5, and it is the absolute error itself, when the absolute error is greater than 0.5. In this way, the gradient from the error function doesn't exceed 1 because once the gradient of the squared error function reaches 1, the absolute error function kicks in, and the gradient remains constant at 1. I've included my current implementation below. For some reason, it's giving me worse performance than just the squared error.
fn_choice_maker1 = (tf.to_int32(tf.sign(y - y_ + 0.5)) + 1)/2
fn_choice_maker2 = (tf.to_int32(tf.sign(y_ - y + 0.5)) + 1)/2
choice_maker_sqr = tf.to_float(tf.mul(fn_choice_maker1, fn_choice_maker2))
sqr_contrib = tf.mul(choice_maker_sqr, tf.square(y - y_))
abs_contrib = tf.abs(y - y_)-0.25 - tf.mul(choice_maker_sqr, tf.abs(y - y_)-0.25)
loss = tf.reduce_mean(sqr_contrib + abs_contrib)
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
choice_maker_sqr
is a column tensor that is one whenever the error is between 0.5 and -0.5. The names are pretty self explanatory.