0

I have a tensorflow model whose outputs correspond to coefficients of multiple polynomials. Note that my model actually has another set outputs (multi-output), but I've mocked this below just by returning the input in addition to the polynomial coefficients.

I'm having a lot of trouble during the training of the model, related to tensor shapes. I've verified that the model is able to predict on sample inputs, and that the loss function works on sample outputs. But, during training, it immediately throws an error (see below)

For every input, the model takes in a fixed embedding-size input, and outputs coefficients for 2 polynomials of degree 2. For example, the output on a single input can look like:

[array([[[1, 2,  3],
        [ 4,  5,  6]]]),
[...]]

corresponding to polynomials [1*x^2+2*x+3, 4*x^2+5*x+6]. Note that I've hidden the second output.

I noticed that tf.math.polyval requires a list of coefficients, making it wonky with AutoGrad. So, I implemented my own version of Horner's algorithm with pure tensors.

import numpy as np
import tensorflow as tf
import logging
import tensorflow.keras as K

@tf.function
def tensor_polyval(coeffs, x):
    """
    Calculates polynomial scalars from tensor of polynomial coefficients
    Tensorflow tf.math.polyval requires a list coeff, which isn't compatible with autograd

    # Inputs:
      - coeffs (NxD Tensor): each row of coeffs corresponds to r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D]
      - x: Scalar!

    # Output:
      - r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D] for row in coeffs
    """
    p = coeffs[:, 0]
    for i in range(1,coeffs.shape[1]):
      tf.autograph.experimental.set_loop_options(
        shape_invariants=[(p, tf.TensorShape([None]))])
      c = coeffs[:, i]
      p = tf.add(c, tf.multiply(x, p))
    return p

@tf.function
def coeffs_to_poly(coeffs, n):
    # Converts a NxD array of coefficients to N evaluated polynomials at x=n
    return tensor_polyval(coeffs, tf.convert_to_tensor(n))

Now here's a super-simplified example of my model, loss function and training routine:

def model_init(embedDim=8, polyDim=2,terms=2):
  input = K.Input(shape=(embedDim,))
  x = K.layers.Reshape((embedDim,))(input)
  aCoeffs = K.layers.Dense((polyDim+1)*terms, activation='tanh')(x)
  aCoeffs = K.layers.Reshape((terms, polyDim+1))(aCoeffs)

  model = K.Model(inputs=input, outputs=[aCoeffs, input])
  return model

def get_random_batch(batch, embedDim, dtype='float64'):
  x = np.random.randn(batch, embedDim).astype(dtype)
  y = np.array([1. for i in range(batch)]).astype(dtype)
  return [x, 
          y]

@tf.function
def test_loss(y_true, y_pred, dtype=dataType):
  an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
                                                    tf.constant(5,dtype=dataType)),
                         y_pred)
  
  return tf.reduce_mean(tf.reduce_mean(an,axis=-1))

embedDim=8
polyDim=2
terms=2
dataType = 'float64'
tf.keras.backend.set_floatx(dataType)

model = model_init(embedDim, polyDim, terms)

XTrain, yTrain  = get_random_batch(batch=128,
                                  embedDim=embedDim)

# Init Model
LR = 0.001
loss = test_loss
epochs = 5
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=loss)

hist = model.fit(XTrain,
        yTrain,
        batch_size=4,
        epochs=epochs,
        max_queue_size=10, workers=2, use_multiprocessing=True)

The error I get is related to the tensor_polyval function:

    <ipython-input-15-f96bd099fe08>:3 test_loss  *
        an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
    <ipython-input-5-7205207d12fd>:23 coeffs_to_poly  *
        return tensor_polyval(coeffs, tf.convert_to_tensor(n))
    <ipython-input-5-7205207d12fd>:13 tensor_polyval  *
        p = coeffs[:, 0]
    ...
    ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_DOUBLE, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2](coeffs, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [3], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.

What's frustrating is that I'm perfectly able to predict with the model on sample inputs and also calculate a sample loss:

test_loss(yTrain[0:5],
          model.predict(XTrain[0:5]),
          dtype=dataType)

which runs just fine.

In the test_loss function, specifically the I'm just referring to the first output, via y_p[0]. It tries to calculate the value of the polynomials at n=5 and then outputs an average over everything (again this is just mocked code). As I understand it, y_p[1] would refer to the second output (in this case, a copy of the input). I would think the tf.vectorized_map should be operating across all outputs of the model batch, but it seems to be slicing one extra dimension??

I noticed that the code does train if I remove the output ,input in the model (making it a single output) and change y_p[0] to y_p in the test_loss. I have no idea why it's broken when adding the extra output, as my understanding of tf.vectorized_map implies that it acts separately on each element of the list y_pred

Alex R.
  • 1,397
  • 3
  • 18
  • 33
  • Using multiple outputs introduces more complexity. Maybe worth try removing the second output (which is the input) and test it. With many outputs and one loss function, Keras calculates losses of each output and sums them up. This is different from manually calling the loss function with the prediction result, so the different outcomes (error vs success) are expected. – Meow Cat 2012 Mar 17 '21 at 02:44
  • @MeowCat2012: Thanks for your suggestion. So what's really frustrating is that it works when I remove the output `,input` from the model and change `y_p[0]` to `y_p`. I'll add this to my post. I have no idea why it's broken when adding the extra output – Alex R. Mar 17 '21 at 02:50
  • If you still need the 2nd output, please refer to the newly added answer~ – Meow Cat 2012 Mar 17 '21 at 03:01

1 Answers1

1

If we need the single loss function to receive multiple outputs altogether, perhaps we can concatenate them together to form one output.

In this case:

  1. Changes to the model structure, here we pack the outputs:
def model_init(embedDim=8, polyDim=2, terms=2):
    input = K.Input(shape=(embedDim, ))
    x = K.layers.Reshape((embedDim, ))(input)
    aCoeffs = K.layers.Dense((polyDim + 1) * terms, activation='tanh')(x)
    # pack the two outputs, add flatten layers if their shapes are not batch*K
    outputs = K.layers.Concatenate()([aCoeffs, input])
    
    model = K.Model(inputs=input, outputs=outputs)
    model.summary()
    return model
  1. Changes to the loss function, here we unpack the outputs:
# the loss function needs to know these
polyDim = 2
terms = 2

@tf.function
def test_loss(y_true, y_pred, dtype=dataType):
    """Loss function for flattened outputs."""
    
    # unpack multiple outputs
    offset = (polyDim + 1) * terms
    aCoeffs = tf.reshape(y_pred[:, :offset], [-1, terms, polyDim + 1])
    inputs = y_pred[:, offset:]
    
    print(aCoeffs, inputs)
    
    # do something with the two unpacked outputs, like below
    an = tf.vectorized_map(
        lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
        aCoeffs)

    return tf.reduce_mean(tf.reduce_mean(an, axis=-1))

Notice that the loss function relies on the knowledge of the original shapes of the outputs in order to restore them. Consider sub-classing tf.keras.losses.Loss.

P.S. For anyone simply need different losses for the multiple losses:

  1. Define loss functions for the two outputs.
@tf.function
def test_loss(y_true, y_pred, dtype=dataType):
    """Loss function for output 1
    (Only changed y_p[0] to y_p)"""
    an = tf.vectorized_map(
        lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
        y_pred)

    return tf.reduce_mean(tf.reduce_mean(an, axis=-1))


@tf.function
def dummy_loss(y_true, y_pred, dtype=dataType):
    """Loss function for output 2 i.e. the input, for debugging
    Better use 0 insead of 1.2345"""
    return tf.constant(1.2345, dataType)
  1. Change to model.compile:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=[test_loss, dummy_loss])
Meow Cat 2012
  • 928
  • 1
  • 9
  • 21
  • Thanks for the quick response! In my case though, my (actual) loss function will depend on each of the outputs. Surely it should be possible in that case? – Alex R. Mar 17 '21 at 03:07
  • Ah but it seems like maybe keras doesn’t support what I want. So I probably need to concatenate the outputs into one flattened tensor and then reshape it during loss calculation. This answer seems to say multi outputs in one loss aren’t possible : https://stackoverflow.com/a/44451189/2781958 – Alex R. Mar 17 '21 at 03:09
  • Means you need a single loss function to handle multiple outputs altogether (instead of separately) ? – Meow Cat 2012 Mar 17 '21 at 03:10
  • I'd flatten and concat the outputs, so there's only one output. I can split and reshape them back in the loss function. Ugly but working approach. – Meow Cat 2012 Mar 17 '21 at 03:11
  • Precisely. My loss merges the two outputs in a nontrivial way which I would prefer be part of the loss function and not the model layers themselves. – Alex R. Mar 17 '21 at 03:12
  • Thanks I really appreciate your help!! – Alex R. Mar 17 '21 at 04:29