0

I’m creating an Artificial Neural Network (ANN) using Kera’s Functional API. Link to the data csv file: https://github.com/dpintof/SPX_Options_ANN/blob/master/MLP3/call_df.csv. Relevant part of the code that reproduces problem:

import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras import layers


# Data 
call_df = pd.read_csv("call_df.csv")

call_X_train, call_X_test, call_y_train, call_y_test = train_test_split(call_df.drop(["Option_Average_Price"],
                    axis = 1), call_df.Option_Average_Price, test_size = 0.01)


# Hyperparameters
n_hidden_layers = 2 # Number of hidden layers.
n_units = 128 # Number of neurons of the hidden layers.

# Create input layer
inputs = keras.Input(shape = (call_X_train.shape[1],))
x = layers.LeakyReLU(alpha = 1)(inputs)

"""
Function that creates a hidden layer by taking a tensor as input and applying a
modified ELU (MELU) activation function.
"""
def hl(tensor):
    # Create custom MELU activation function
    def melu(z):
        return tf.cond(z > 0, lambda: ((z**2)/2 + 0.02*z) / (z - 2 + 1/0.49), 
                        lambda: 0.49*(keras.activations.exponential(z)-1))
    
    y = layers.Dense(n_units, activation = melu)(tensor)
    return y

# Create hidden layers
for _ in range(n_hidden_layers):
    x = hl(x)

# Create output layer
outputs = layers.Dense(1, activation = keras.activations.softplus)(x)

# Actually create the model
model = keras.Model(inputs=inputs, outputs=outputs)


# QUICK TEST
model.compile(loss = "mse", optimizer = keras.optimizers.Adam())
history = model.fit(call_X_train, call_y_train, 
                    batch_size = 4096, epochs = 1,
                    validation_split = 0.01, verbose = 1)

This is the error I get when I do model.fit(…) (notice that 4096 is my batch size and 128 is the number of neurons of the hidden layers):

InvalidArgumentError:  The second input must be a scalar, but it has shape [4096,128]
     [[{{node dense/cond/dense/BiasAdd/_5}}]] [Op:__inference_keras_scratch_graph_1074]

Function call stack:
keras_scratch_graph

I know the problem has to do with the custom activation function because the program runs fine if I use the following hl function instead:

def hl(tensor):
    lr = layers.Dense(n_units, activation = layers.LeakyReLU())(tensor)
    return lr

I got the same error when trying to define melu(z) like this:

@tf.function
def melu(z):
    if z > 0:
        return ((z**2)/2 + 0.02*z) / (z - 2 + 1/0.49)
    else:
        return 0.49*(keras.activations.exponential(z)-1)

From How do you create a custom activation function with Keras? I also tried the following, but without success:

def hl(tensor):
    # Create custom MELU activation function
    def melu(z):
        return tf.cond(z > 0, lambda: ((z**2)/2 + 0.02*z) / (z - 2 + 1/0.49), 
                        lambda: 0.49*(keras.activations.exponential(z)-1))
    
    from keras.utils.generic_utils import get_custom_objects
    get_custom_objects().update({'melu': layers.Activation(melu)})
 
    x = layers.Dense(n_units)(tensor)
    y = layers.Activation(melu)(x)
    return y
Wasonic
  • 39
  • 6

1 Answers1

0

This issue happens because tf.cond expects a scalar for the condition argument (instead of a multi-dimensional tensor). Instead, you can use tf.where to apply the conditional element-wise.

For example, you can define melu as follows:

def melu(z):
    return tf.where(z > 0, ((z**2)/2 + 0.02*z) / (z - 2 + 1/0.49), 
                           0.49*(keras.activations.exponential(z)-1))

NOTE: Not tested.

rvinas
  • 11,824
  • 36
  • 58
  • Your answer solved the error. However, my training and validation errors return nan. Any idea why that is? – Wasonic Apr 05 '21 at 07:42
  • Strange, I can train your model without any issue (on random data). Could you please make sure there are no NaNs or missing values in your input data? – rvinas Apr 05 '21 at 08:16
  • Tried removing rows with nan (if there were any) and got the same result. Went back to the simpler hl function (in the original post) and got numbered errors again. So there still seems to be some problem with the melu function. – Wasonic Apr 06 '21 at 07:44
  • Where does this activation function come from? Does it make sense mathematically? You can try training with random data - the implementation worked for me. – rvinas Apr 06 '21 at 08:28
  • The activation function comes from "Andrey Itkin (2019) Deep learning calibration of option pricing models: some pitfalls and solutions". It's basically a paper on option pricing and calibration of models that price options. I think the activation function makes sense. For the purposes of this paper we need C^2 functions like MELU. – Wasonic Apr 09 '21 at 04:57