How to use matrix-shaped inputs for dense layer in keras?

Question

To build a regression/forecast model I'd like to take a matrix of sensor-readings (rows~sensors, columns~timepoints) and predict a future trend for these sensors.

Example implementation

# install.packages(c("keras", "tensorflow"))
library(keras)
library(tensorflow)

#' Prepare some training data mapping matrices to other smaller matrices where the response entries correspond to basic math
n = 1000000
nb = 10
mx = matrix(rnorm(6 * n, 0, 1), nrow = n, byrow = TRUE)
my = matrix(0, nrow = n, ncol = 3)
eps = 0.01

for (i in 1 : n) {
    x1 = mx[i, 1]; x2 = mx[i, 2]; x3 = mx[i, 3]; x4 = mx[i, 4]; x5 = mx[i, 5]; x6 = mx[i, 6];
    s1 = x1 * x1;   s2 = x2 * x2;   s3 = x3 * x3;   s4 = x4 * x4;   s5 = x5 * x5;   s6 = x6 * x6;
    zz = rnorm(1, 0, 1)

    my[i, 1] = (x1 + x2 + x3 + x4 + x5 + x6 + eps * zz)
    my[i, 2] = (s1 + s2 + eps * zz * zz)
    my[i, 3] = (x1 * s1 + s2 + x5 * s5 + x6 * s6 + eps * zz)
}

#' Recast into tf types
x_train = tf$constant(mx, shape = as.integer(c(n / nb, nb, 6)))
# FLATTENING the input would work WOULD WORK:
# x_train = tf$constant(mx, shape = as.integer(c(n / nb, nb, 6)))
y_train = tf$constant(my, shape = as.integer(c(n / nb, nb, 3)))


#' Build the model
inputShape = dim(x_train)[- 1] 
outputShape = dim(y_train)[- 1]

model1 = keras_model_sequential() %>%
    layer_dense(units = 64, activation = "relu", input_shape = inputShape) %>%
    layer_dense(units = 256, activation = "relu") %>%
    layer_dense(units = prod(outputShape)) %>%
    layer_reshape(outputShape) %>%
    compile(loss = "mse", optimizer = "adam", metrics = list("mean_absolute_error", "mean_squared_error"))

model1 %>% summary
fit(model1, x_train, y_train, epochs = 3, validation_split = 0.2, verbose = 1)

model2 = keras_model_sequential() %>%
## tbd layer_input  --> layer_rehsape --> layer_dense (which seems to work best with non-matrix valued inputs
    layer_dense(units = 64, activation = "relu", input_shape = inputShape) %>%
    layer_dense(units = 256, activation = "relu") %>%
    layer_dense(units = outputShape[2]) %>%
# layer_dense(units = prod(outputShape)) %>%
# layer_reshape(outputShape) %>%
    compile(loss = "mse", optimizer = "adam", metrics = list("mean_absolute_error", "mean_squared_error"))

model2 %>% summary

fit(model2, x_train, y_train, epochs = 3, validation_split = 0.2, verbose = 1)

Summary of model1 is

> model1 %>% summary
Model: "sequential_42"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #
================================================================================
dense_126 (Dense)                   (None, 10, 64)                  448
________________________________________________________________________________
dense_127 (Dense)                   (None, 10, 256)                 16640
________________________________________________________________________________
dense_128 (Dense)                   (None, 10, 30)                  7710
________________________________________________________________________________
reshape_29 (Reshape)                (None, 10, 3)                   0
================================================================================
Total params: 24,798
Trainable params: 24,798
Non-trainable params: 0
________________________________________________________________________________

Shape of mode2 is

> model2 %>% summary
Model: "sequential_43"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #
================================================================================
dense_129 (Dense)                   (None, 10, 64)                  448
________________________________________________________________________________
dense_130 (Dense)                   (None, 10, 256)                 16640
________________________________________________________________________________
dense_131 (Dense)                   (None, 10, 3)                   771
================================================================================
Total params: 17,859
Trainable params: 17,859
Non-trainable params: 0
________________________________________________________________________________

Although both models have the same input and output shape, model1 fails to train with with

Error in py_call_impl(callable, dots$args, dots$keywords) :
  ValueError: in user code:

    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:531 train_s

By flattening the input it works (see the commented line where x_train is defined). However, I wonder why we can't use matrix-shaped input value here for the dense layer (or how to do so correctly)?

Note: The example is written using https://keras.rstudio.com/ but since it's a rather 1:1 wrapper API, I'd happy with a python answer as well.

Alexandr Dibrov · Accepted Answer · 2020-05-29T18:48:06.403

2

According to the Dense docs (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense):

If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 1 of the kernel (using tf.tensordot)

Therefore, if the input tensor has a shape (a,b,c) and the Dense layer has d units, the output tensor has a shape (a,b,d). If you pass your tensor through multiple Dense layers, only the last dimension will change.

Now, if the code runs with a flattened x, a potential problem is the shape mismatch. Indeed, y_train seems to not have the same dimensions as the output of the network.

According to this

x_train = tf$constant(mx, shape = as.integer(c(n / nb, nb, 6)))
y_train = tf$constant(my, shape = as.integer(c(n / nb, nb, 3)))

x_train and y_train have identical dimensions except for the last one. Then, for the predictions and y_train to have the same dimensions, your model should end with sth like

layer_dense(units = outputShape[3]) %>%

instead of

layer_dense(units = prod(outputShape)) %>%
layer_reshape(outputShape) %>%

That's just the technical side though. Not sure if conceptually it is what you are after.

edited May 29 '20 at 18:48

answered May 29 '20 at 18:36

Alexandr Dibrov

156
5

Thanks @Alexandr for your guidance. I edited my question to be more clear about the shapes (which seem fine at first glance). Your rank-pointer into the tf docs was very valuable to understand the internal process. However, it does not yet answer my question fully about *why& model1 (vectorized input and output using reshape layer to enable arbitrary output tensor shapes) fails to train. Btw, small world -> Best regards to Robert, Martin and scicomp. :-) – Holger Brandl Jun 02 '20 at 10:50
Hi @Holger! Small world indeed :-) In the case of model1 you are trying to take the output of dense_128 with the shape of [None, 10, 30] and reshape it into a tensor with the shape [None, 10, 3]. That doesn't make much sense :-) And potentially results in the error. What kind of behavior are you trying to achieve? – Alexandr Dibrov Jun 02 '20 at 12:09
Indeed, the wrong assumption in model1 was that `prod(outputShape)` would provide a layer that could be subsequently reshaped into `outputShape`. This is not the case because of the rank-selective behavior of tf. Thanks for your help. – Holger Brandl Jun 02 '20 at 20:25

score 0 · Answer 2 · answered May 29 '20 at 09:01

0

Well, since you are have time-depend dataset, why not try the keras.layer.TimeDistributed API and see if your data sequence is aligned sort by time points?

answered May 29 '20 at 09:01

Leon Wang

188
1
7

Maybe, but it's not clear how to do so. The use-cases of TimeDistributed which I know of always center around the output. Following on you comment I've found https://stackoverflow.com/questions/47305618/what-is-the-role-of-timedistributed-layer-in-keras where it states that *Dense is that in keras from version 2.0 Dense is by default applied to only last dimension* (see full explanation in the link). So maybe this could be a lead why my code fails? – Holger Brandl May 29 '20 at 09:48
Another followup which I dug out thanks to your suggestion is https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ which states that *Dense layer can now directly support 3D input* which justifies the question. So it seems to me that I'm just using the API incorrectly. – Holger Brandl May 29 '20 at 09:50

How to use matrix-shaped inputs for dense layer in keras?

2 Answers2