How to add regularizations in TensorFlow?

Question

I found in many available neural network code implemented using TensorFlow that regularization terms are often implemented by manually adding an additional term to loss value.

My questions are:

Is there a more elegant or recommended way of regularization than doing it manually?
I also find that get_variable has an argument regularizer. How should it be used? According to my observation, if we pass a regularizer to it (such as tf.contrib.layers.l2_regularizer, a tensor representing regularized term will be computed and added to a graph collection named tf.GraphKeys.REGULARIZATOIN_LOSSES. Will that collection be automatically used by TensorFlow (e.g. used by optimizers when training)? Or is it expected that I should use that collection by myself?

just to be super explicit, is the way to do it `S = tf.get_variable(name='S', regularizer=tf.contrib.layers.l2_regularizer )`? — Charlie Parker, Aug 07 '16 at 03:57
@Euler_Salter I don't remember anymore, sorry! Not using tensor flow anymore! — Charlie Parker, Nov 13 '17 at 18:43

score 70 · Accepted Answer · edited Dec 25 '22 at 01:04

70

As you say in the second point, using the regularizer argument is the recommended way. You can use it in get_variable, or set it once in your variable_scope and have all your variables regularized.

The losses are collected in the graph, and you need to manually add them to your cost function like this.

  reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
  reg_constant = 0.01  # Choose an appropriate one.
  loss = my_normal_loss + reg_constant * sum(reg_losses)

edited Dec 25 '22 at 01:04

starball

20,030
7
43
238

answered May 10 '16 at 15:47

Lukasz Kaiser

2,186
12
4

2

Thanks man. I was thinking TensorFlow would have some more intelligent ways of handling reg terms than manually do them, seems not :P – Lifu Huang May 11 '16 at 07:27
14

BTW, two suggestions, correct me if I am wrong. (1), I guess `reg_constant` might not be necessary, since regularizers in TensorFlow have an argument `scale` in their constructors so that the impact of reg terms can be controlled in a more fine-grained manner. And (2) using `tf.add_n` might be slightly better than `sum`, I guess using sum might create many tensors in graph to store intermediate result. – Lifu Huang May 11 '16 at 08:58
1

so just to make it super clear, after I put the regularizer to the variable `S = tf.get_variable(name='S', regularizer=tf.contrib.layers.l2_regularizer )`, then do I the code you suggested? As in `sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))` ? – Charlie Parker Aug 07 '16 at 04:19
You do not need to multiply the regularization loss(es) with a regularization constant. When using tf.get_variable() you can specify a scale to your regularizer (essentially the lambda) and tensorflow will automatically do the multiplication of the loss with that scale/2 – mirceamironenco Nov 19 '16 at 17:53
1

Could show how to make the weights variables to be part of the collection retrievable by tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)? – Yu Shen Jan 11 '17 at 04:44
3

It seems like `tf.reduce_sum` should be used instead of `sum`? – ComputerScientist May 23 '17 at 23:44
Can you please provide an example? Normally I do `W1 = tf.Variable(tf.truncated_normal([n,m], stddev = 0.01))` where should I add the regularizer? – Euler_Salter Nov 13 '17 at 16:49
1

@Euler_Salter - `tf.get_variable(...)` is apparently now the preferred way to create a variable, although `tf.Variable(...)` is still supported. – Scott Smith Feb 04 '18 at 02:03
@ScottSmith Do you know where I could get more information on the topic? – Euler_Salter Feb 04 '18 at 17:46
1

@Euler_Salter - This is pretty good for regularization: https://greydanus.github.io/2016/09/05/regularization/ As for `tf.Variable()` vs `tf.get_variable()`, this seems to cover it pretty well: https://stackoverflow.com/questions/37098546/difference-between-variable-and-get-variable-in-tensorflow/37099025 – Scott Smith Feb 05 '18 at 19:00
@ScottSmith Thank you a lot! It seems to me very hard to learn tensorflow because most tutorials online assume that you already know it, or that you don't need to know tensorflow in depth. Can I ask you how you learned it? – Euler_Salter Feb 06 '18 at 08:24

score 47 · Answer 2 · answered Apr 10 '17 at 16:50

47

A few aspects of the existing answer were not immediately clear to me, so here is a step-by-step guide:

Define a regularizer. This is where the regularization constant can be set, e.g.:
```
regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
```
Create variables via:
```
    weights = tf.get_variable(
        name="weights",
        regularizer=regularizer,
        ...
    )
```
Equivalently, variables can be created via the regular weights = tf.Variable(...) constructor, followed by tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, weights).
Define some loss term and add the regularization term:
```
reg_variables = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
reg_term = tf.contrib.layers.apply_regularization(regularizer, reg_variables)
loss += reg_term
```
Note: It looks like tf.contrib.layers.apply_regularization is implemented as an AddN, so more or less equivalent to sum(reg_variables).

answered Apr 10 '17 at 16:50

bluenote10

23,414
14
122
178

10

I think you're applying the regularizer twice - both in step and step 3. `apply_regularization` shouldn't be necessary if you already specified the regularizer when creating the variable. – interjay Apr 16 '17 at 19:02
2

@interjay please make an example, all these answers are super unclear! This is because there is always at least one person writing a comment underneath saying that the above answer has something wrong. – Euler_Salter Nov 27 '17 at 16:49
1

@interjay I'm pretty sure that doing both was necessary the last time I tested this. I'm not sure if this has changed though. – bluenote10 Nov 27 '17 at 17:27
1

No, that makes no sense because then you wouldn't need to pass the same regularizer to two functions. The documentation (and the name) makes it clear that `REGULARIZATION_LOSSES` is the total loss returned from the regularizers, so you are essentially calling `regularizer(regularizer(weight))`. – interjay Nov 27 '17 at 17:39
1

I think the confusion here stems from the "equivalently" part. He describes two different methods and you pick one, it's not one method that involves appling regularization twice. – gcp Jan 31 '18 at 09:33

score 29 · Answer 3 · edited Mar 31 '18 at 13:35

29

I'll provide a simple correct answer since I didn't find one. You need two simple steps, the rest is done by tensorflow magic:

Add regularizers when creating variables or layers:

tf.layers.dense(x, kernel_regularizer=tf.contrib.layers.l2_regularizer(0.001))
# or
tf.get_variable('a', regularizer=tf.contrib.layers.l2_regularizer(0.001))

Add the regularization term when defining loss:

loss = ordinary_loss + tf.losses.get_regularization_loss()

edited Mar 31 '18 at 13:35

pkuderov

3,501
2
28
46

answered Jan 03 '18 at 11:05

alyaxey

1,129
12
10

If i am creating a regularizer op by regularizer = tf.contrib.layers.l2_regularizer(0.001) can i pass it to multiple layer initiations? or do i need to create a separate regularizer for each layer likeregularizer1=tf.contrib.layers.l2_regularizer(0.001), regularizer2 = ................. regularizer3 = ...... and so on? – figs_and_nuts Apr 07 '18 at 06:21
@Nitin You can use the same regularizer. It's just a python function that applies loss to weights as its argument. – alyaxey Apr 07 '18 at 10:57
1

This looks like the most elegant solution but does this really work? How is this different from say reg_variables = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) reg_term = tf.contrib.layers.apply_regularization(regularizer, reg_variables) loss += reg_term – GeorgeOfTheRF Jun 04 '18 at 19:54
1

I just want to mention that tf.contrib.layers.fully_connected can replace tf.layers.dense and, furthermore, add more functionalities. Refer to these: [this](https://stackoverflow.com/questions/44912297/are-tf-layers-dense-and-tf-contrib-layers-fully-connected-interchangeable/44913058), [this](https://www.tensorflow.org/api_docs/python/tf/layers/dense), and [this](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected). – O. Salah Jul 31 '18 at 09:29

score 16 · Answer 4 · answered May 23 '17 at 23:55

Another option to do this with the contrib.learn library is as follows, based on the Deep MNIST tutorial on the Tensorflow website. First, assuming you've imported the relevant libraries (such as import tensorflow.contrib.layers as layers), you can define a network in a separate method:

def easier_network(x, reg):
    """ A network based on tf.contrib.learn, with input `x`. """
    with tf.variable_scope('EasyNet'):
        out = layers.flatten(x)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=10, # Because there are ten digits!
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = None)
        return out

Then, in a main method, you can use the following code snippet:

def main(_):
    mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

    # Make a network with regularization
    y_conv = easier_network(x, FLAGS.regu)
    weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'EasyNet') 
    print("")
    for w in weights:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")
    reg_ws = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 'EasyNet')
    for w in reg_ws:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")

    # Make the loss function `loss_fn` with regularization.
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    loss_fn = cross_entropy + tf.reduce_sum(reg_ws)
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss_fn)

To get this to work you need to follow the MNIST tutorial I linked to earlier and import the relevant libraries, but it's a nice exercise to learn TensorFlow and it's easy to see how the regularization affects the output. If you apply a regularization as an argument, you can see the following:

- EasyNet/fully_connected/weights:0 shape:[784, 200] size:156800
- EasyNet/fully_connected/biases:0 shape:[200] size:200
- EasyNet/fully_connected_1/weights:0 shape:[200, 200] size:40000
- EasyNet/fully_connected_1/biases:0 shape:[200] size:200
- EasyNet/fully_connected_2/weights:0 shape:[200, 10] size:2000
- EasyNet/fully_connected_2/biases:0 shape:[10] size:10

- EasyNet/fully_connected/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_1/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_2/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0

Notice that the regularization portion gives you three items, based on the items available.

With regularizations of 0, 0.0001, 0.01, and 1.0, I get test accuracy values of 0.9468, 0.9476, 0.9183, and 0.1135, respectively, showing the dangers of high regularization terms.

Really detailed example. – stackoverflowuser2010 Nov 26 '17 at 02:04 — stackoverflowuser2010, Nov 26 '17 at 02:04

evantkchong · Answer 5 · 2019-03-25T08:29:17.520

If anyone's still looking, I'd just like to add on that in tf.keras you may add weight regularization by passing them as arguments in your layers. An example of adding L2 regularization taken wholesale from the Tensorflow Keras Tutorials site:

model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

There's no need to manually add in the regularization losses with this method as far as I know.

Reference: https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_weight_regularization

score 4 · Answer 6 · edited Jul 25 '18 at 13:14

4

I tested tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) and tf.losses.get_regularization_loss() with one l2_regularizer in the graph, and found that they return the same value. By observing the value's quantity, I guess reg_constant has already make sense on the value by setting the parameter of tf.contrib.layers.l2_regularizer.

edited Jul 25 '18 at 13:14

BookOfGreg

3,550
2
42
56

answered Jul 25 '18 at 08:08

ocean

41
2

score 3 · Answer 7 · answered Dec 04 '18 at 11:31

If you have CNN you may do the following:

In your model function:

conv = tf.layers.conv2d(inputs=input_layer,
                        filters=32,
                        kernel_size=[3, 3],
                        kernel_initializer='xavier',
                        kernel_regularizer=tf.contrib.layers.l2_regularizer(1e-5),
                        padding="same",
                        activation=None) 
...

In your loss function:

onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
regularization_losses = tf.losses.get_regularization_losses()
loss = tf.add_n([loss] + regularization_losses)

score 2 · Answer 8 · answered May 07 '18 at 07:51

2

cross_entropy = tf.losses.softmax_cross_entropy(
  logits=logits, onehot_labels=labels)

l2_loss = weight_decay * tf.add_n(
     [tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])

loss = cross_entropy + l2_loss

answered May 07 '18 at 07:51

Alex-zhai

311
3
3

1

Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you’ve made. – Maximilian Peters May 07 '18 at 08:36

score 1 · Answer 9 · answered Apr 29 '18 at 09:08

Some answers make me more confused.Here I give two methods to make it clearly.

#1.adding all regs by hand
var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
var2 = tf.Variable(name='v2',initial_value=1.0,dtype=tf.float32)
regularizer = tf.contrib.layers.l1_regularizer(0.1)
reg_term = tf.contrib.layers.apply_regularization(regularizer,[var1,var2])
#here reg_term is a scalar

#2.auto added and read,but using get_variable
with tf.variable_scope('x',
        regularizer=tf.contrib.layers.l2_regularizer(0.1)):
    var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
    var2 = tf.get_variable(name='v2',shape=[1],dtype=tf.float32)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
#here reg_losses is a list,should be summed

Then,it can be added into the total loss

score 1 · Answer 10 · answered Mar 25 '19 at 11:36

tf.GraphKeys.REGULARIZATION_LOSSES will not be added automatically, but there is a simple way to add them:

reg_loss = tf.losses.get_regularization_loss()
total_loss = loss + reg_loss

tf.losses.get_regularization_loss() uses tf.add_n to sum the entries of tf.GraphKeys.REGULARIZATION_LOSSES element-wise. tf.GraphKeys.REGULARIZATION_LOSSES will typically be a list of scalars, calculated using regularizer functions. It gets entries from calls to tf.get_variable that have the regularizer parameter specified. You can also add to that collection manually. That would be useful when using tf.Variable and also when specifying activity regularizers or other custom regularizers. For instance:

#This will add an activity regularizer on y to the regloss collection
regularizer = tf.contrib.layers.l2_regularizer(0.1)
y = tf.nn.sigmoid(x)
act_reg = regularizer(y)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, act_reg)

(In this example it would presumably be more effective to regularize x, as y really flattens out for large x.)

How to add regularizations in TensorFlow?

10 Answers10

Linked