2

I am currently implementing a demand forecasting solution using Deep Learning in Pytorch with multihorizon output, and I am experimenting with Pytorch forecasting (PTF) models (N-Beats, N-hits, DeepAR, etc), using TimeSeriesDataset and DataLoader structures. The target distribution contains 0s in its majority and therefore I chose to experiment with TweedieLoss(). Due to the way my project is structured, I am cleaning, imputing and scaling the data before creating the datasets/dataloaders, so that I can experiment with Tensorflow and/or non-Lightning versions of Pytorch model architectures as well (using numpy arrays as a base data strucure before transforming to tensors). I have the following questions:

  1. To my understanding, PTF recommends the following setting for using Tweedie loss (as is aparent here): Use a TimeSeriesDataset with target_normalizer=EncoderNormalizer(transformation=dict(forward=torch.log1p)), which implements log(X+1) to the target variable as a transformation, without rescaling (exp) back to original scale afterwards. I presume this is because the TweedieLoss() object actually does this itself during it's to_prediction() function (gets model prediction in log scale, scales it back to original scale and computes the loss metric). Is my above intuition true? Am I missing something somewhere?

  2. If so, wouldn't it be the same if I chose to manually apply log(X+1) to my target variable, discard the EncoderNormalizer() part from my TimeSeriesDataset entirely and rely on the TweedieLoss() to rescale everything back to original scale before computing the loss? The latter seems to be throwing "Loss is not finite. Resetting it to 1e9" error.

  3. I found two implementations of TweedieLoss/Deviance:

  • Wikipedia Version (from here):
    part1 = torch.exp(torch.clamp(y_true,min=0.)*(2-self.p))/((1-self.p)*(2-self.p))
    part2 = y_true * torch.exp(y_pred*(1-self.p))/(1-self.p)
    part3 = torch.exp(y_pred*(2-self.p))/(2-self.p)
    loss = -(part1 - part2 + part3)
  • PTF Github implementation (from here):
    a = y_true * torch.exp(y_pred * (1 - self.p)) / (1 - self.p)
    b = torch.exp(y_pred * (2 - self.p)) / (2 - self.p)
    loss = -a + b

Their Difference is Wiki's part1 is missing, I guess. Note that I ported it back to Google Colab's (python 3.7 - PTF 0.10.1) version from the latest Pytorch Forecasting package (0.10.3) that requires Python 3.10.

Again, to my understanding (and sources from here), Wikipedia's equation is the Tweedie Deviance which is equal to -2LL (log Likelihood), which means that minimizing Tweedie Deviance and neg log Likelihood should be similar objectives? Finally, T tried experimenting with the two loss implementations for a hardcoded pairs of y_pred and y_true tensors (edge cases like tensor.zeros() where both preds and targets are all 0s). Loss looks normal. Still I cannot train the model using my own data. Any help will be highly appreciated!

0 Answers0