I am trying to use Keras in order to set up a neural network, which is supposed to approximate an unknown function F(x)
. The resulting neural-network approximation for F(x)
should then be used in order to find the minimum of the sum G(x) + F(x)
, where G(x)
is some arbitrary function. The problem I am currently facing is, that the neural-network approximation for F(x)
is not smooth enough and consequently local optimizer get stuck in tiny local Minima. I would be very thankful for any ideas in order to improve my results or to solve this problem.
Simple example: Minimizing the variance
Let me illustrate the problem on a very simplified example: I will try to teach a neural network the function F(x) = 4*var(x)
with x = [x1,...,xN]
and 0 <= xi <= 1
, where var(x)
means the variance of the vector x
. Subsequently I will try to find the minimum of the neural-network representation of F(x)
under the constraint, that x
has a given mean value. The complete code for this example can be found in section. 3 below.
1. Creation and training of neural network
First I create a neural network for the approximation of F(x)
:
N = 6 # Dimension of input vector x
# Set up the neural network
model = Sequential()
model.add(Dense(50, activation='sigmoid', input_dim=N))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='mse')
Subsequently I generate training data and train the model:
# Generate training data
n_train = 100000 # Number of training samples
X_train = random((n_train, N))
y_train = 4*X_train.var(axis=1)
# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
After the training is completed I test the accuracy of the model. To this aim I use test(100)
(for the definition of the test
function see the full code in sec. 3 below) and for my particular model I get an average error of 0.00517
which seems to be a pretty good result.
2. Minimizing F(x)
Having the neural-network approximation for F(x)
I want to find its minimum under the constraint that x
has a given average value. To this aim I will try the local optimizer minimize
as well as the global optimizer differential_evolution
from scipy.optimize
.
Local optimization
I try to minimize F(x)
under the constraint mean(x) = 0.5
. Clearly the exact result F_min = 0
, i.e., the minimum of the variance under the given constraint, would be obtained for the uniform distribution x = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
. I purposely choose a bad starting vector x0
in order to check if the optimizer can find its way to the minimum:
# Constraint
avg = 0.5 # Given average value of x
cons = {'type': 'eq', 'fun': lambda x: x.mean()-avg}
# Start vector
x0 = avg * np.array([2, 2, 2, 0, 0, 0])
# Minimization of neural-network representation
res_ML = minimize(lambda x: model.predict([[x]]), x0,
bounds=N*[(0, 1)], constraints=cons)
# Minimization of the exact function
res_ex = minimize(lambda x: 4*x.var(), x0,
bounds=N*[(0, 1)], constraints=cons)
The results for my model are as follows:
>>> res_ML.success
True
>>> res_ML.x
array([1., 1., 1., 0., 0., 0.])
>>> res_ex.x
array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
Using the neural-network representation for F(x)
, the optimizer got immediately stuck. Using the exact function F(x) = 4*var(x)
the optimizer found the correct result.
Global optimization
Instead of a local optimizer I was thinking to try a global optimizer. First I have tried to use shgo
from scipy.optimize
since it supports constraints, however it seems to be unable to find the minimum of F(x)
even if the exact function is used (more details about this problem can be found here). Therefore I tried differential_evolution
. Since differential_evolution
does not support constraints, I enforce the condition mean(x) = 0.5
by using a penalty function:
# Minimization of neural-network representation
res2_ML = differential_evolution(lambda x: model.predict([[x]]) +
1e3*(np.mean(x)-avg)**2, bounds=N*[(0, 1)])
# Minimization of the exact function
res2_ex = differential_evolution(lambda x: 4*x.var() + 1e3*(np.mean(x)-avg)**2,
bounds=N*[(0, 1)])
The results I obtain are as follows:
>>> res2_ML.success
True
>>> res2_ML.x
array([0.50276561, 0.49869386, 0.49310187, 0.49895304, 0.4987404 ,
0.50770651])
>>> res2_ex.x
array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
>>> [float(model.predict([[x]])) for x in (res2_ML.x, res2_ex.x)]
[0.05173008516430855, 0.05170735716819763]
The result obtained by using the neural-network approximation for F(x)
is already better than in the case of local optimization, but it is still not optimal. The issue is not that the model actually predicts the minimum to occur at the point res2_ML.x
as can be seen from the last line, since the prediction of the model for the correct vector res2_ex.x
is in fact lower. I have also tried to use tol=1e-12
in the call to differential_evolution
, in order to increase the accuracy of the result, but without any noticeable improvement.
3. The complete Code
import numpy as np
from numpy.random import random
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from scipy.optimize import minimize, differential_evolution
# Parameters
N = 6 # Lenght of input vectors
n_train = 100000 # Number of training samples
# Generate training data
X_train = random((n_train, N))
y_train = 4*X_train.var(axis=1)
# Set up and train the neural network
model = Sequential()
model.add(Dense(50, activation='sigmoid', input_dim=N))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
# ############################ Local Optimization #############################
# Constraint
avg = 0.5 # Given average value of x
cons = {'type': 'eq', 'fun': lambda x: x.mean()-avg}
# Start vector
x0 = avg * np.array([2, 2, 2, 0, 0, 0])
# Minimization of neural-network representation
res_ML = minimize(lambda x: model.predict([[x]]), x0,
bounds=N*[(0, 1)], constraints=cons)
# Minimization of the exact function
res_ex = minimize(lambda x: 4*x.var(), x0,
bounds=N*[(0, 1)], constraints=cons)
# ########################### Global Optimization #############################
# Minimization of neural-network representation using differential_evolution
res2_ML = differential_evolution(lambda x: model.predict([[x]]) +
1e3*(np.mean(x)-avg)**2, bounds=N*[(0, 1)],
tol=1e-6)
# Minimization of neural-network representation using shgo
res3_ML = shgo(lambda x: model.predict([[x]]), bounds=N*[(0, 1)],
constraints=cons, sampling_method='sobol')
# Minimization of the exact function
res2_ex = differential_evolution(lambda x: 4*x.var() + 1e3*(np.mean(x)-avg)**2,
bounds=N*[(0, 1)])
# ############################# Helper Function ###############################
def test(n_test):
'''
Function for testing the model.
'''
x = random((n_test, N)) # Test data
pred = model.predict([x]) # Model prediction
exct = 4*x.var(axis=1) # Exact values
diff = np.abs(pred.flatten()-exct) # Difference
# Print the test results to screen
print('\npred. | exact | diff.')
print('---------------------------')
for k in range(n_test):
print('%.5f | %.5f | %.5f' % (pred[k], exct[k], diff[k]))
print('---------------------------')
print(' avg. error | %.5f' % diff.mean())
Update
I made a mistake with the tol
parameter for the differential_evolution
method. Many thanks to Romeo Valentin for pointing this out. I have corrected this mistake in sec. 3. Using the tol
parameter correctly definitely does improve the results of differential_evolution
.
Furthermore, after posting an issue regarding the shgo
optimizer on github, it turned out that the shgo
optimizer works well, if the sobol
sampling method is used:
# Minimization of neural-network representation using shgo
res3_ML = shgo(lambda x: model.predict([[x]]), bounds=N*[(0, 1)],
constraints=cons, sampling_method='sobol')
Using this sampling method the result is perfect:
>>> res3_ML.success
True
>>> res3_ML.x
array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
I have added the minimization using shgo
to the full code in sec. 3.
I regard this problem as basically solved. However, I am still wondering, if my neural network is really the right choice for this kind of task, or if there are more advanced structures or activation functions, which could yield a smooth(er) function approximation.