I'm trying to design a neural network using Keras with priority on prediction performance, and I cannot get sufficiently high accuracy by further reducing the number of layers and nodes per layer. I have noticed that very large portion of my weights are effectively zero (>95%). Is there a way to prune dense layers in hope of reducing prediction time?
-
What does "effectively zero" exactly mean? Which layer types do you use? What have you tried? – Martin Thoma Apr 04 '17 at 08:31
-
@MartinThoma I was using basic `Dense` layers. Most weights were either equal to zero, or were so close to zero that setting them to zero wouldn't change any output of the network for any input. However no single node could be removed from the network without increasing loss of the average test case. It is my assumption that at some point where most weights are useless sparse network would be more efficient when it comes to prediction. – Mirac7 Apr 06 '17 at 07:31
-
"were so close to zero that setting them to zero wouldn't change any output of the network for any input" - what does that mean? 10^-5? 10^-6? 10^-100? – Martin Thoma Apr 06 '17 at 07:33
-
2@MartinThoma I think the 'threshold for irrelevance' was somewhere about `< 10^-15`, determined empirically – Mirac7 Apr 06 '17 at 07:46
-
You can prune neural networks with keras now – Mohit Motwani Jul 10 '19 at 12:19
4 Answers
Not a dedicated way :(
There's currently no easy (dedicated) way of doing this with Keras.
A discussion is ongoing at https://groups.google.com/forum/#!topic/keras-users/oEecCWayJrM.
You may also be interested in this paper: https://arxiv.org/pdf/1608.04493v1.pdf.

- 2,999
- 19
- 25
-
Very interesting paper, thank you. Do you have any insight into when (or if) this is going to be implemented in Keras? Or should I make a switch to a different framework? – Mirac7 Feb 01 '17 at 10:53
-
2I don't believe changing frameworks would help much, honestly. As far as I know, neither tensorflow nor theano have this kind of feature implemented. You could nevertheless work out something manually... you threshold the absolute values of your weights, remove them from the layer and also remove the weights corresponding to the neurons you've just removed, when that's the case. It doesn't sound really straight-forward, but I don't think there are big secrets either. – grovina Feb 02 '17 at 11:01
-
https://www.tensorflow.org/model_optimization/guide/pruning is a dedicated way in Keras. As I suggest below, support for latency improvements is a work in progress – Alan Chiao Jun 05 '19 at 02:41
Take a look at Keras Surgeon: https://github.com/BenWhetton/keras-surgeon
I have not tried it myself, but the documentation claims that it has functions to remove or insert nodes.
Also, after looking at some papers on pruning, it seems that many researchers create a new model with less channels (or less layers), and then copy the weights from the original model to the new model.

- 3,758
- 1
- 17
- 10
-
1Just tried pruning using Surgeon, and it was a breeze, as long as one has the conditions implemented. It seems that there are no ready heuristics for pruning, just the tool itself. – Felix May 13 '19 at 09:19
See this dedicated tooling for tf.keras. https://www.tensorflow.org/model_optimization/guide/pruning
As the overview suggests, support for latency improvements is a work in progress
Edit: Keras -> tf.keras based on LucG's suggestion.

- 161
- 3
-
Some of the model_optimization functions of tensorflow are actually dedicated to tf.keras and do not work with keras itself. – LucG May 16 '20 at 07:42
If you set an individual weight to zero won't that prevent it from being updated during back propagation? Shouldn't thatv weight remain zero from one epoch to the next? That's why you set the initial weights to nonzero values before training. If you want to "remove" an entire node, just set all of the weights on that node's output to zero and that will prevent that nodes from having any affect on the output throughout training.

- 18,473
- 10
- 83
- 106
-
We init weights to non-zero also because we want gradients to be different for every neuron. A single weights at 0 gets updated. – LucG May 16 '20 at 07:50
-
1@LucG Yea, you're right. You'd have to reset it to zero or create a mask layer to prevent it from being updated during back-propagation. – hobs May 20 '20 at 17:57