Deciding if a function is linear or not is of course not a matter of opinion or debate; there is a very simple definition of a linear function, which is roughly:
f(a*x + b*y) = a*f(x) + b*f(y)
for every x
& y
in the function domain and a
& b
constants.
The requirement "for every" means that, if we are able to find even a single example where the above condition does not hold, then the function is nonlinear.
Assuming for simplicity that a = b = 1
, let's try x=-5, y=1
with f
being the ReLU function:
f(-5 + 1) = f(-4) = 0
f(-5) + f(1) = 0 + 1 = 1
so, for these x
& y
(in fact for every x
& y
with x*y < 0
) the condition f(x + y) = f(x) + f(y)
does not hold, hence the function is nonlinear...
The fact that we may be able to find subdomains (e.g. both x
and y
being either negative or positive here) where the linearity condition holds is what defines some functions (such as ReLU) as piecewise-linear, which are still nonlinear nevertheless.
Now, to be fair to your question, if in a particular application the inputs happened to be always either all positive or all negative, then yes, in this case the ReLU would in practice end up behaving like a linear function. But for neural networks this is not the case, hence we can rely on it indeed to provide our necessary non-linearity...