I have a tensorflow model whose outputs correspond to coefficients of multiple polynomials. Note that my model actually has another set outputs (multi-output), but I've mocked this below just by returning the input in addition to the polynomial coefficients.
I'm having a lot of trouble during the training of the model, related to tensor shapes. I've verified that the model is able to predict on sample inputs, and that the loss function works on sample outputs. But, during training, it immediately throws an error (see below)
For every input, the model takes in a fixed embedding-size input, and outputs coefficients for 2
polynomials of degree 2
. For example, the output on a single input can look like:
[array([[[1, 2, 3],
[ 4, 5, 6]]]),
[...]]
corresponding to polynomials [1*x^2+2*x+3, 4*x^2+5*x+6]
. Note that I've hidden the second output.
I noticed that tf.math.polyval
requires a list of coefficients, making it wonky with AutoGrad. So, I implemented my own version of Horner's algorithm with pure tensors.
import numpy as np
import tensorflow as tf
import logging
import tensorflow.keras as K
@tf.function
def tensor_polyval(coeffs, x):
"""
Calculates polynomial scalars from tensor of polynomial coefficients
Tensorflow tf.math.polyval requires a list coeff, which isn't compatible with autograd
# Inputs:
- coeffs (NxD Tensor): each row of coeffs corresponds to r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D]
- x: Scalar!
# Output:
- r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D] for row in coeffs
"""
p = coeffs[:, 0]
for i in range(1,coeffs.shape[1]):
tf.autograph.experimental.set_loop_options(
shape_invariants=[(p, tf.TensorShape([None]))])
c = coeffs[:, i]
p = tf.add(c, tf.multiply(x, p))
return p
@tf.function
def coeffs_to_poly(coeffs, n):
# Converts a NxD array of coefficients to N evaluated polynomials at x=n
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
Now here's a super-simplified example of my model, loss function and training routine:
def model_init(embedDim=8, polyDim=2,terms=2):
input = K.Input(shape=(embedDim,))
x = K.layers.Reshape((embedDim,))(input)
aCoeffs = K.layers.Dense((polyDim+1)*terms, activation='tanh')(x)
aCoeffs = K.layers.Reshape((terms, polyDim+1))(aCoeffs)
model = K.Model(inputs=input, outputs=[aCoeffs, input])
return model
def get_random_batch(batch, embedDim, dtype='float64'):
x = np.random.randn(batch, embedDim).astype(dtype)
y = np.array([1. for i in range(batch)]).astype(dtype)
return [x,
y]
@tf.function
def test_loss(y_true, y_pred, dtype=dataType):
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
tf.constant(5,dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an,axis=-1))
embedDim=8
polyDim=2
terms=2
dataType = 'float64'
tf.keras.backend.set_floatx(dataType)
model = model_init(embedDim, polyDim, terms)
XTrain, yTrain = get_random_batch(batch=128,
embedDim=embedDim)
# Init Model
LR = 0.001
loss = test_loss
epochs = 5
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=loss)
hist = model.fit(XTrain,
yTrain,
batch_size=4,
epochs=epochs,
max_queue_size=10, workers=2, use_multiprocessing=True)
The error I get is related to the tensor_polyval
function:
<ipython-input-15-f96bd099fe08>:3 test_loss *
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
<ipython-input-5-7205207d12fd>:23 coeffs_to_poly *
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
<ipython-input-5-7205207d12fd>:13 tensor_polyval *
p = coeffs[:, 0]
...
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_DOUBLE, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2](coeffs, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [3], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.
What's frustrating is that I'm perfectly able to predict with the model on sample inputs and also calculate a sample loss:
test_loss(yTrain[0:5],
model.predict(XTrain[0:5]),
dtype=dataType)
which runs just fine.
In the test_loss
function, specifically the I'm just referring to the first output, via y_p[0]
. It tries to calculate the value of the polynomials at n=5
and then outputs an average over everything (again this is just mocked code). As I understand it, y_p[1]
would refer to the second output (in this case, a copy of the input). I would think the tf.vectorized_map
should be operating across all outputs of the model batch, but it seems to be slicing one extra dimension??
I noticed that the code does train if I remove the output ,input
in the model (making it a single output) and change y_p[0]
to y_p
in the test_loss
. I have no idea why it's broken when adding the extra output, as my understanding of tf.vectorized_map implies that it acts separately on each element of the list y_pred