How to apply a Cleverhans attack when the final layer is not `softmax` (e.g. ensemble models)?

Question

I am trying to attack an ensemble of Keras models following the method proposed in this paper. In section 5, they note that the attack is of the form:

So, I moved on to create an ensemble of pretrained Keras MNIST models as follows:

def ensemble(models, model_input):

    outputs = [model(model_input) for model in models]
    y = Average()(outputs)

    model = Model(model_input, y, name='ensemble')

    return model

models = [...] # list of pretrained Keras MNIST models

model = ensemble(models, model_input)
model_wrapper = KerasModelWrapper(model)
attack_par = {'eps': 0.3, 'clip_min': 0., 'clip_max': 1.}
attack = FastGradientMethod(model_wrapper, sess=sess)

x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
                                      nchannels))
attack.generate(x, **attack_par) # ERROR!

At the final line, I get the following error:

----------------------------------------------------------
Exception                Traceback (most recent call last)
<ipython-input-23-1d2e22ceb2ed> in <module>
----> 1 attack.generate(x, **attack_par)

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/fast_gradient_method.py in generate(self, x, **kwargs)
     48     assert self.parse_params(**kwargs)
     49 
---> 50     labels, _nb_classes = self.get_or_guess_labels(x, kwargs)
     51 
     52     return fgm(

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/attack.py in get_or_guess_labels(self, x, kwargs)
    276       labels = kwargs['y_target']
    277     else:
--> 278       preds = self.model.get_probs(x)
    279       preds_max = reduce_max(preds, 1, keepdims=True)
    280       original_predictions = tf.to_float(tf.equal(preds, preds_max))

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/utils_keras.py in get_probs(self, x)
    188     :return: A symbolic representation of the probs
    189     """
--> 190     name = self._get_softmax_name()
    191 
    192     return self.get_layer(x, name)

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/utils_keras.py in _get_softmax_name(self)
    126         return layer.name
    127 
--> 128     raise Exception("No softmax layers found")
    129 
    130   def _get_abstract_layer_name(self):

Exception: No softmax layers found

It seems like it is a requirement that the final layer of the target model is a softmax layer. However, Fast Gradient Method technically doesn't need to have that as a requirement. Is this something that Cleverhans enforces for the ease of library implementation? Are there ways to get around this problem and use Cleverhans to attack models without the final softmax layer?

I thought of a potential way to fix this issue, which is to define a cleverhans Model class and define the probabilities and the logits explicitly. However, this is also problematic because logits of an ensemble model is not defined; there are logits for each model that are being ensembled, but since the ensemble happens after each logit is softmaxed, there's no overall logit. — alpaca, Aug 07 '19 at 22:55

score 1 · Accepted Answer · answered Aug 08 '19 at 17:11

The reason why CleverHans requires one to pass logits to the attacks is for numerical stability (e.g., so we don't take logs of exponents).

That said, attacking an ensemble is a legitimate use case. I can think of two options:

if all of your models have comparable logit distributions, you could average the logits and provide those to the attack object.
you could compute the adversary's loss on each of the N models within the ensemble, average all of these N adversarial losses, and then the attack would optimize this averaged loss.

The second option would require modifying the existing CleverHans API but if you would like to make a PR to the GitHub repo, I would be happy to help review it.

Hope this helps.

How to apply a Cleverhans attack when the final layer is not `softmax` (e.g. ensemble models)?

1 Answers1