CNN Image Recognition with Regression Output on Tensorflow

Question

I want to predict the estimated wait time based on images using a CNN. So I would imagine that this would use a CNN to output a regression type output using a loss function of RMSE which is what I am using right now, but it is not working properly.

Can someone point out examples that use CNN image recognition to output a scalar/regression output (instead of a class output) similar to wait time so that I can use their techniques to get this to work because I haven't been able to find a suitable example.

All of the CNN examples that I found are for the MSINT data and distinguishing between cats and dogs which output a class output, not a number/scalar output of wait time.

Can someone give me an example using tensorflow of a CNN giving a scalar or regression output based on image recognition.

Thanks so much! I am honestly super stuck and am getting no progress and it has been over two weeks working on this same problem.

Imagine a image with traffic, and without traffic, or such examples with lines and no lines. I basically just want a number output though. — Ic3MaN911, Aug 06 '17 at 04:06
image with traffic and no traffic is a 2 class problem. What wait time you want to predict there using a single image?. — Vijay Mariappan, Aug 06 '17 at 04:31
Please understand, it was just an example, I am working on stuff that is under NDA so I was just giving a similar example. The most important part is the regression output, the waiting time is just an example. I am just looking for examples with a scalar output. Trust me, it is not a 2 class problem, when I ran the sum(unique) for my target variable: it gave me a 4739753 as the answer so there are 4739753 unique classes, which is why I want a regression output. I also have tens of thousands of unique images. — Ic3MaN911, Aug 06 '17 at 05:04
Having so many classes doesn't make it a regression problem. — Vijay Mariappan, Aug 06 '17 at 05:08
So what do you suggest, should I use 4739753 classes? There is actually an infinite number since I am dealing with time. — Ic3MaN911, Aug 06 '17 at 05:21
This is a classic example of how hard it can be to find answers on stack overflow. Often people will fight you about the technicality of how you worded the question or the details of a broad example intended to illustrate a point, and never actually address your main question. — rodrigo-silveira, Mar 21 '18 at 22:52

j314erre · Answer 1 · 2020-12-30T15:43:26.720

Check out the Udacity self-driving-car models which take an input image from a dash cam and predict a steering angle (i.e. continuous scalar) to stay on the road...usually using a regression output after one or more fully connected layers on top of the CNN layers.

https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models

Here is a typical model:

https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/autumn

...it uses tf.atan() or you can use tf.tanh() or just linear to get your final output y.

Use MSE for your loss function.

Here is another example in keras...

model = models.Sequential()
model.add(convolutional.Convolution2D(16, 3, 3, input_shape=(32, 128, 3), activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(32, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(core.Flatten())
model.add(core.Dense(500, activation='relu'))
model.add(core.Dropout(.5))
model.add(core.Dense(100, activation='relu'))
model.add(core.Dropout(.25))
model.add(core.Dense(20, activation='relu'))
model.add(core.Dense(1))
model.compile(optimizer=optimizers.Adam(lr=1e-04), loss='mean_squared_error')

They key difference from the MNIST examples is that instead of funneling down to a N-dim vector of logits into softmax w/ cross entropy loss, for your regression output you take it down to a 1-dim vector w/ MSE loss. (you can also have a mix of multiple classification and regression outputs in the final layer...like in YOLO object detection)

score 2 · Answer 2 · edited Apr 09 '18 at 01:13

2

The key is to have NO activation function in your last Fully Connected (output) layer. Note that you must have at least 1 FC layer beforehand.

edited Apr 09 '18 at 01:13

answered Apr 09 '18 at 00:32

Edd

36
3

CNN Image Recognition with Regression Output on Tensorflow

2 Answers2

Linked