I'm trying to attack an MLP with DeepFool, but plotting results I have some strange behavior.
First of all the MLP structure is as follows:
Dense(16, activation='relu', input_shape=(512,)) Dense(16, activation='relu') Dense(2, activation='softmax')
And it is trained with a standard approach passing a training set and validation set with labels, using following parameters:
training_params = { 'optimizer': 'adam', 'loss': 'sparse_categorical_crossentropy', 'metrics': ['accuracy'] }
I know that DeepFool can be used only deleting classification layers from the network, so i deleted the last Dense layer and create a KerasClassifier:
logit_model = tf.keras.Model(MLP.input, MLP.layers[-2].output) classifier = KerasClassifier(clip_values=(0, 8), model=logit_model)
N.B. clip_values=(0,8) because feature vectors assume values between 0 and 8.
When i try to attack the MLP with DeepFool tryinng different values of epsilon, it happens that the perturbation remain constant and also if i pass to the attack a max perturbation (epsilon) of 0.001, the adversarial sample perturbation is 0.7. I will show an example of code and the relative output below.
The attack
`
import matplotlib.pyplot as plt
import numpy as np
epsilon_list = [0, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]
max_iter = 10
acc = []
pert = []
for eps in epsilon_list:
attack = DeepFool(classifier=classifier, epsilon=eps, max_iter=10, verbose=False)
test_samples_adv = attack.generate(X_test_copy)
loss_test, accuracy_test = MLP.evaluate(test_samples_adv, Y_test)
perturbation = np.mean(np.abs((test_samples_adv - X_test_copy)))
print('Accuracy on adversarial test data: {:4.2f}%'.format(accuracy_test * 100))
print('Average perturbation: {:4.2f}'.format(perturbation))
acc.append(accuracy_test)
pert.append(perturbation)
x = np.array(pert)
y = np.array(acc)
plotting_curves(x, y)
`
plotting_curves is just a function that plot a graph passing x and y.
These are the results: (https://i.stack.imgur.com/WMGBp.png)
Can anybody explain me if it has sense and why?