4

As part of a personal project I'm trying to modify the example code given in Theano documentation (Multilayer Perceptron) with my own data.

Till now I managed to bring my own (text) data in the format required and I want to build a binary classifier. The thing is that when I write that the number of outputs is 1 i.e.

classifier = MLP(rng=rng, input=x, n_in=49, n_hidden=n_hidden, n_out=1)

I get the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Asterios\Anaconda\lib\site-packages\spyderlib\widgets\externalshell  \sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/Asterios/Documents/Python/TripAdvisor/untitled4.py", line 603, in <module>
params = test_mlp()
File "C:/Users/Asterios/Documents/Python/TripAdvisor/untitled4.py", line 553, in test_mlp
minibatch_avg_cost = train_model(minibatch_index)
File "C:\Users\Asterios\Anaconda\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 588, in __call__
self.fn.thunks[self.fn.position_of_error])
File "C:\Users\Asterios\Anaconda\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 579, in __call__
outputs = self.fn()
ValueError: y_i value out of bounds
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Elemwise{Cast{int32}}.0)
Inputs shapes: [(10L, 1L), (1L,), (10L,)]
Inputs strides: [(8L, 8L), (8L,), (4L,)]
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.

The output of my training data (before casting to theano shared type) is like this:

array([1, 1, 1, ..., 0, 0, 0], dtype=int64)

The strange thing is that if I use as a number of output neurons ANYTHING above 1 (e.g. n_out=2), the code is running without any errors but of course now there are many output neurons that have no practical meaning.

Could some please explain why the code with binary output seems to give me an error? How can I get this working?

Thank you!

Stergios
  • 3,126
  • 6
  • 33
  • 55
  • I'm not really familiar with Theano, but maybe it wants to output units for binary classification? Softmax on two units is equivalent to logistic with one (except the regularizer may turn out to be a bit different, due to there being more hidden->output weights). – Fred Foo May 22 '14 at 22:14

1 Answers1

3

The logistic regression class used as output layer in the MLP tutorial is not the "standard" logistic regression, which gives as output a single value and discriminates between just two classes, but rather a Multinomial Logistic Regression (a.k.a Softmax Regression), which gives as output one value for each class, telling the probability of the input belonging to them. So, if you have 10 classes, you'll also need 10 units and obviously the sum of all output units equals 1, since it's a probability distribution.

Despite of the class name used ("LogistRegression"), its doctring in the linked source code leaves no doubts of its real intent ('''Multi-class Logistic Regression Class [...]''').

Whereas in your problem you have two classes, you'll also need 2 output units and the value for your n_out must be 2 instead of 1. Of course, with two classes the value for one output will be always 1 minus the value of the other.

Also, check if you really need int64 instead of int32. Theano has much better support for the second.

Saul Berardo
  • 2,610
  • 3
  • 20
  • 24
  • Great!! Thanks Berardo, this must be the problem. The easiest solution would be to select the output with the highest value. However I think it would be better to have a real Binary Logistic Regression instead of a Multinomial one. I search without success to find some tried code for Theano Binary Log. Regression. Do you happen to have any link to such implementation? – Stergios May 23 '14 at 10:18
  • In what sense do you think it would be better? Better accuracy? Faster? I don't think there is really any advantage in "real binary logistic regression". If you just need something which looks like it externally, just create a wrapper class around softmax and return always the first number in each prediction. – Saul Berardo May 23 '14 at 17:59
  • I think it will be a bit faster since one weight less needs to be calculated. I'm not sure how much it will affect total running time but anyway. I'll try both an check if there's any visible difference. Thanks again! – Stergios May 23 '14 at 18:35
  • If you are running in a GPU you could add dozens of other units and there would be no difference in the running time. If you are on a CPU, Theano is not the best option for a plan old logistic regression. Let us know the results of your experiments anyway. And please don't forget to come back and flag the right answer. Good luck. – Saul Berardo May 23 '14 at 20:02