0

I am predicting 20 numbers for a regression tasks from a VGG, using lasagne and theano. For an example script I wrote, number of images are 100. I think I am doing something stupid, but am stuck.

Looking online, for people who use nolearn, it can be fixed by specifying regression=True, but I am just using lasagne

So:

('X.shape', (100, 3, 224, 224))
('y.shape', (100, 20))

Here's the exact error message

Traceback (most recent call last):
  File "script_1.py", line 167, in <module>
    loss = train_batch()
  File "script_1.py", line 90, in train_batch
    return train_fn(X_tr[ix], y_tr[ix])
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/compile/function_module.py", line 786, in __call__
    allow_downcast=s.allow_downcast)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/tensor/type.py", line 177, in filter
    data.shape))
TypeError: ('Bad input argument to theano function with name "script_1.py:159"  at index 1(0-based)', 'Wrong number of dimensions: expected 1, got 2 with shape (16, 20).')

Here's the model

def build_model():
    net = {}
    net['input'] = InputLayer((None, 3, 224, 224))
    net['conv1'] = ConvLayer(net['input'], num_filters=96, filter_size=7, stride=2, flip_filters=False)
     ...............
    net['drop7'] = DropoutLayer(net['fc7'], p=0.5)
    net['fc8'] = DenseLayer(net['drop7'], num_units=20, nonlinearity=None)
    return net

Generators:

def batches(iterable, N):
    chunk = []
    for item in iterable:
        chunk.append(item)
        if len(chunk) == N:
            yield chunk
            chunk = []
    if chunk:
        yield chunk
def train_batch():
    ix = range(len(y_tr))
    np.random.shuffle(ix)
    ix = ix[:BATCH_SIZE]
    return train_fn(X_tr[ix], y_tr[ix])

Relevant training snippet

X_sym = T.tensor4()
y_sym = T.ivector()
output_layer = net['fc8']
prediction = lasagne.layers.get_output(output_layer, X_sym)
loss = lasagne.objectives.squared_error(prediction, y_sym)
loss = loss.mean()
acc = T.mean(T.eq(T.argmax(prediction, axis=1), y_sym), dtype=theano.config.floatX)
params = lasagne.layers.get_all_params(output_layer, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.0001, momentum=0.9)

train_fn = theano.function([X_sym, y_sym], loss, updates=updates)
val_fn = theano.function([X_sym, y_sym], [loss, acc])
pred_fn = theano.function([X_sym], prediction)

for epoch in range(5):
    for batch in range(25):
       loss = train_batch()
.....
madratman
  • 127
  • 2
  • 8

1 Answers1

1

Your prediction output has shape (batchsize, 20) but your y_sym variable is an ivector type, so it is a vector of length batchsize, presumably. I'm not sure if this is what is causing the error, but I don't think you can compute a squared error term for these two quantities. One is a matrix, the other is a vector, their shapes do not seem to align?

What are your regression targets? If you are predicting 20 numbers for each datapoint, your y_sym should probably be a matrix, then you can compute the squared error term.

Another possibility is to change your last layer to have nonlinearity be sigmoid. This way you can interpret your convolutional neural network as producing multi-label probabilities. You can then synthetically produce multi-label probability target variables for your regression. One example is, say a datapoint x has multi-labels 0, 1, 10. You could create a 20 length vector where 0, 1, 10 are like 1-a for some small a, and the other entries are small, positive numbers.

You could also switch your objective function to be binary cross entropy. In this case we aren't performing regression. However, we still have that our network outputs a class score matrix, not a vector. This is typically the loss function used when detecting the presence of K different objects in an image (think cat, dog, human, car, bike, etc), e.g. multi-label classification. If you want to try out this route, we will change our last layer as follows:

net['fc8'] = DenseLayer(net['drop7'], num_units=20, nonlinearity=sigmoid)

Now I will interpret my network as outputting probabilities; each probability will represent a confidence score as to whether or not our network believes that object class is in the image.

We will have the following loss function now:

X_sym = T.tensor4()
y_sym = T.imatrix() 
output_layer = net['fc8']
probabilities = lasagne.layers.get_output(output_layer, X_sym)
loss = lasagne.objectives.binary_crossentropy(probabilities, y_sym)
loss = loss.mean()
... the rest of the update stuff...
train_fn = theano.function([X_sym, y_sym], loss, updates=updates)

The key difference here is that our target variable y_sym is now an imatrix. This must be a {0,1} valued (batchsize,K) sized matrix, where 0 represents the object not being present in the image and 1 represents the object being present in the image.

And to compute the accuracy for multi-label prediction one typically uses the F1-score. Here is a reference to how to compute the F1-score using scikit-learn. Our validation function would be different in this case, it may look something like this:

 from sklearn.metrics import f1_score
 probabilities_fn = theano.function([X_sym], probabilities)
 confidence_threshold = .5 
 predictions = np.where(probabilities_fn(X_sym) > confidence_threshold, 1, 0)  
 f1_score(y_true, predictions, average='micro')

The above code snippet with return the probabilities/confidence scores from our network, and any probability over our confidence_threshold parameter, here I choose .5, we will interpret that as the class label being present.

Community
  • 1
  • 1
Indie AI
  • 601
  • 1
  • 6
  • 6
  • Thanks! I realized that last night and changed to a vector, but seems like I have messed up the accuracy now. – madratman Mar 28 '16 at 18:32
  • Unable to fix. Can you please help with the correct expression of the accuracy(it's classification acc in the snipper, which is wrong)? I think I am messing with the axis.. – madratman Mar 28 '16 at 19:17
  • What did you switch to a vector? If you are trying to classify images which can have multiple objects, say K total classes (cat, dog, person, bicycle, car, etc), for each datapoint you need to output K class scores, so the output of your model should be a matrix. If you want to keep track of accuracy, I don't believe argmax is appropriate anymore as that will restrict you to predicting only 1 class per image. You may want to use something like the F1 score, which is typically used for multi-label image classification. – Indie AI Mar 28 '16 at 22:03
  • Oops. I meant a matrix, that is, I changed y_sym to a matrix. Thanks for updating the answer. I am trying to predict x and y coordinates of 10 keypoints. Hence I need regression. I am not sure if the multi class paradigm is the right way to go about it. What I need to fix is the accuracy for regression output in my code snippet for that. What do you think should be a good measure in this case? Maybe I should also scale my labels? (as done here http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/). – madratman Mar 30 '16 at 03:53
  • Or maybe you are right. Remember the "regression=True" + nolearn comment in the beginning of my question? This http://stackoverflow.com/questions/32654026/convolutional-neural-network-accuracy-with-lasagne-regression-vs-classification?rq=1 suggests that in case of regression=True, a nolearn network outputs the probabilities of each class, and computes the loss with the squared error on the output vector. "ctrl+f" regression here https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/base.py – madratman Mar 30 '16 at 04:01
  • Alright, so I changed acc to T.mean(T.eq(prediction, y_sym), dtype=theano.config.floatX) following from this line https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/base.py#L480. It's starts to train with zero accuracy in the beginning. (epoch, loss, acc) looks like (0, 0.083694695249984144, 0.0) (1, 0.082823878840396284, 0.0) (2, 0.080720447788113048, 0.0) (3, 0.082895943208744646, 0.0) This is when I have scaled the keypoint coordinates to 0 to 1. (x = x/width, y=y/height), 0 to 1 as origin is on top left(opencv convention) – madratman Mar 30 '16 at 04:19
  • Maybe my accuracy should be something like acc = T.mean(T.sum(T.sqr(prediction - y_sym), axis=1), axis=0, dtype=theano.config.floatX) Also, tried nolearn, but that gives nans as well. Here's the code https://gist.github.com/madratman/2c799f8023eeb7cb3910985b4d6f541b – madratman Mar 30 '16 at 06:43