12

How do I implement this metric in Keras? My code below gives the wrong result! Note that I'm undoing a previous log(x + 1) transformation via exp(x) - 1, also negative predictions are clipped to 0:

def rmsle_cust(y_true, y_pred):
    first_log = K.clip(K.exp(y_pred) - 1.0, 0, None)
    second_log = K.clip(K.exp(y_true) - 1.0, 0, None)
    return K.sqrt(K.mean(K.square(K.log(first_log + 1.) - K.log(second_log + 1.)), axis=-1)

For comparison, here's the standard numpy implementation:

def rmsle_cust_py(y, y_pred, **kwargs):
    # undo 1 + log
    y = np.exp(y) - 1
    y_pred = np.exp(y_pred) - 1

    y_pred[y_pred < 0] = 0.0
    to_sum = [(math.log(y_pred[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
    return (sum(to_sum) * (1.0/len(y))) ** 0.5

What I'm doing wrong? Thanks!

EDIT: Setting axis=0 seems to give a value very close to the correct one, but I'm not sure since all the code I've seem uses axis=-1.

Fernando
  • 7,785
  • 6
  • 49
  • 81

2 Answers2

7

I ran into the same problem and searched for it, here is what I found

https://www.kaggle.com/jpopham91/rmlse-vectorized

After modified a bit, this seems to work for me,rmsle_K method implemented with Keras and TensorFlow.

import numpy as np
import math
from keras import backend as K
import tensorflow as tf

def rmsle(y, y0):
    assert len(y) == len(y0)
    return np.sqrt(np.mean(np.power(np.log1p(y)-np.log1p(y0), 2)))

def rmsle_loop(y, y0):
    assert len(y) == len(y0)
    terms_to_sum = [(math.log(y0[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y0)]
    return (sum(terms_to_sum) * (1.0/len(y))) ** 0.5

def rmsle_K(y, y0):
    return K.sqrt(K.mean(K.square(tf.log1p(y) - tf.log1p(y0))))

r = rmsle(y=[5, 20, 12], y0=[8, 16, 12])
r1 = rmsle_loop(y=[5, 20, 12], y0=[8, 16, 12])
r2 = rmsle_K(y=[5., 20., 12.], y0=[8., 16., 12.])

print(r)

print(r1)

sess = tf.Session()

print(sess.run(r2))

Result:

Using TensorFlow backend

0.263978210565

0.263978210565

0.263978
sachin dubey
  • 755
  • 9
  • 28
LYu
  • 2,316
  • 4
  • 21
  • 38
  • Thanks, but what about the exp(x) - 1 transformation? – Fernando Dec 03 '17 at 15:10
  • @Fernando I don't think you need that transformation though https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError – LYu Dec 03 '17 at 18:21
  • I need because my model fits log(x + 1), so I need to transform back via exp(x) - 1, then apply RMSLE. – Fernando Dec 04 '17 at 12:53
  • Won't minimizing log error also minimize real error, as the log function is a monotonic transformation? – Saedeas Dec 08 '17 at 02:58
3

By the use of a list (to_sum) in the numpy implementation, I suspect your numpy array has shape (length,).

And on Keras, since you've got different results with axis=0 and axis=1, you probably got some shape like (length,1).

Also, when creating the to_sum list, you're using y[i] and y_pred[i], which means you're taking elements from the axis=0 in numpy implementation.

The numpy implementation also sums everything for calculating the mean in sum(to_sum). So, you really don't need to use any axis in the K.mean.

If you make sure your model's output shape is either (length,) or (length,1), you can use just K.mean(value) without passing the axis parameter.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • I noticed that, but Keras is giving a slightly different result for the same (y, y_pred) pair. I have no idea why. – Fernando Dec 04 '17 at 13:19