Keras: dictionaries as validation_data

Question

From Keras manual I learn that the variable validation_data could be:

A tuple (x_val, y_val) of NumPy arrays or tensors.
A tuple (x_val, y_val, val_sample_weights) of NumPy arrays.
A tf.data.Dataset.
A Python generator or keras.utils.Sequence returning (inputs, targets) or (inputs, targets, sample_weights).

My question is: since I am using multiple named inputs, could I use a tuple (x_val, y_val) as validation_data, where x_val is a dictionary of NumPy arrays (with keys equal to the names of the model's input) and y_val is a simple NumPy array?

Thank you for your help.

How is your original data stored like? I can sketch out a more complete solution. But the answer to your question is, yes. In facts, that's what we do in our company. Although you might want to have tensors instead of numpy arrays in there. It also depends on what happens in your training loop (e.g. how is your model consuming that data). — D_Serg, Dec 18 '21 at 21:45
Tried passing a dictionary as my `validation_data` and ran into the same issue as [here:](https://stackoverflow.com/questions/61706535/keras-validation-loss-and-accuracy-stuck-at-0) — Roman, Feb 24 '22 at 22:15

score 0 · Answer 1 · answered Oct 04 '22 at 14:59

Since you are using multiple named inputs, you cannot pass a tuple (x_val, y_val) for the validation_data parameter (at least, currently, Keras does not support that). As per the TensorFlow and Keras documentations:

validation_data will override validation_split. validation_data could be:

A tuple (x_val, y_val) of Numpy arrays or tensors.

A tuple (x_val, y_val, val_sample_weights) of NumPy arrays.

A tf.data.Dataset.

A Python generator or keras.utils.Sequence returning (inputs, targets) or (inputs, targets, sample_weights). validation_data is not yet supported with tf.distribute.experimental.ParameterServerStrategy.

Potential solution:

One potential solution is to concatenate the training and validation datasets and pass it to the fit method as arguments for x and y, while specifying the validation part using the validation_split. Note that:

The validation data is selected from the last samples in the x and y data provided, before shuffling.

More details

Let's say your dataset has two inputs, e.g. in1 and in2, and two outputs, e.g. out1 and out2.

Optional reading

You can first shuffle your training and validation datasets as needed:

concat_xy_train=np.concatenate((train_in1, train_in2, train_out1, train_out2), axis=1)
concat_xy_val=np.concatenate((val_in1, val_in2, val_out1, val_out2), axis=1)
np.random.shuffle(concat_xy_train)
np.random.shuffle(concat_xy_val)

You can, then, retrieve your features and lables:

shuf_train_in1 = concat_xy_train[:,:len_in1]
shuf_train_in2 = concat_xy_train[:,len_in1:len_in1+len_in2]
shuf_train_out1 = concat_xy_train[:,len_in1+len_in2:len_in1+len_in2+len_out1]
shuf_train_out2 = concat_xy_train[:,len_in1+len_in2+len_out1:]

shuf_val_in1 = concat_xy_val[:,:len_in1]
shuf_val_in2 = concat_xy_val[:,len_in1:len_in1+len_in2]
shuf_val_out1 = concat_xy_val[:,len_in1+len_in2:len_in1+len_in2+len_out1]
shuf_val_out2 = concat_xy_val[:,len_in1+len_in2+len_out1:]

Concatenation of training and validation datasets

train_val_in1 = np.concatenate((shuf_train_in1, shuf_val_in1), axis=0)
train_val_in2 = np.concatenate((shuf_train_in2, shuf_val_in2), axis=0)
train_val_out1 = np.concatenate((shuf_train_out1, shuf_val_out1), axis=0)
train_val_out2 = np.concatenate((shuf_train_out2, shuf_val_out2), axis=0)

Fitting thee model

When fitting the model:

model.fit(
    {"in1": train_val_in1, "in2": train_val_in2},
    {"out1": train_val_out1, "out2": train_val_out2},
    validation_split=len_val/(len_val+len_train),
...

Keras: dictionaries as validation_data

1 Answers1