Why is ReLU a non-linear activation function?

Question

As I understand it, in a deep neural network, we use an activation function (g) after applying the weights (w) and bias(b) (z := w * X + b | a := g(z)). So there is a composition function of (g o z) and the activation function makes so our model can learn function other than linear functions. I see that Sigmoid and Tanh activation function makes our model non-linear, but I have some trouble seeing that a ReLu (which takes the max out of 0 and z) can make a model non-linear...

Let's say if every Z is always positive, then it would be as if there was no activation function...

So why does ReLu make a neural network model non-linear?

Bacause the function is... well... non-linear. Piece-wise linear is enough to make it non-linear. — norok2, Sep 21 '18 at 15:22

desertnaut · Accepted Answer · 2018-09-21T19:50:25.203

Deciding if a function is linear or not is of course not a matter of opinion or debate; there is a very simple definition of a linear function, which is roughly:

f(a*x + b*y) = a*f(x) + b*f(y)

for every x & y in the function domain and a & b constants.

The requirement "for every" means that, if we are able to find even a single example where the above condition does not hold, then the function is nonlinear.

Assuming for simplicity that a = b = 1, let's try x=-5, y=1 with f being the ReLU function:

f(-5 + 1) = f(-4) = 0
f(-5) + f(1) = 0 + 1 = 1

so, for these x & y (in fact for every x & y with x*y < 0) the condition f(x + y) = f(x) + f(y) does not hold, hence the function is nonlinear...

The fact that we may be able to find subdomains (e.g. both x and y being either negative or positive here) where the linearity condition holds is what defines some functions (such as ReLU) as piecewise-linear, which are still nonlinear nevertheless.

Now, to be fair to your question, if in a particular application the inputs happened to be always either all positive or all negative, then yes, in this case the ReLU would in practice end up behaving like a linear function. But for neural networks this is not the case, hence we can rely on it indeed to provide our necessary non-linearity...

Thanks for such an explanation, could you elaborate a bit on your last remarks "But for neural networks, this is not the case"... — Khalid Saifullah, Jan 08 '21 at 10:49
@KhalidSaifullah inputs to NN layers are never always positive or always negative — desertnaut, Jan 08 '21 at 11:49

Why is ReLU a non-linear activation function?

1 Answers1