Before I give you my answer, let's think a bit about the rationale behind an auto-encoder (AE):
The purpose of auto-encoder is to learn, in an unsupervised manner, something about the underlying structure of the input data. How does AE achieves this goal? If it manages to reconstruct the input signal from its output signal (that is usually of lower dimension) it means that it did not lost information and it effectively managed to learn a more compact representation.
In most examples, it is assumed, for simplicity, that both input signal and output signal ranges in [0..1]. Therefore, the same non-linearity (sigmf
) is applied both for obtaining the output signal and for reconstructing back the inputs from the outputs.
Something like
output = sigmf( W*input + b ); % compute output signal
reconstruct = sigmf( W'*output + b_prime ); % notice the different constant b_prime
Then the AE learning stage tries to minimize the training error || output - reconstruct ||
.
However, who said the reconstruction non-linearity must be identical to the one used for computing the output?
In your case, the assumption that inputs ranges in [0..1] does not hold. Therefore, it seems that you need to use a different non-linearity for the reconstruction. You should pick one that agrees with the actual range of you inputs.
If, for example, your input ranges in (0..inf) you may consider using exp
or ().^2
as the reconstruction non-linearity. You may use polynomials of various degrees, log
or whatever function you think may fit the spread of your input data.
Disclaimer: I never actually encountered such a case and have not seen this type of solution in literature. However, I believe it makes sense and at least worth trying.