Can a neural network recognize a screen and replicate a finite set of actions?

Question

I learned, that neural networks can replicate any function.

Normally the neural network is fed with a set of descriptors to its input neurons and then gives out a certain score at its output neuron. I want my neural network to recognize certain behaviours from a screen. Objects on the screen are already preprocessed and clearly visible, so recognition should not be a problem.

Is it possible to use the neural network to recognize a pixelated picture of the screen and make decisions on that basis? The amount of training data would be huge of course. Is there way to teach the ANN by online supervised learning?

Edit: Because a commenter said the programming problem would be too general: I would like to implement this in python first, to see if it works. If anyone could point me to a resource where i could do this online-learning thing with python, i would be grateful.

Yes, you can train a suitably large ANN to replicate any function given a suitable amount of samples. However, it's much more likely that another tool is more suited for your problem. — yiding, Jan 28 '13 at 10:17
I guess SO is the wrong place for this question. SO is for concrete programming problems, not for finding solutions to general problems. Nevertheless: The theory tells that an ANN can do almost everything (famous citation "There's an ANN for that."[citation needed] ;-) ). In practice, either computational performance of detection performance are not really good. — Thorsten Kranz, Jan 28 '13 at 10:25

score 1 · Answer 1 · answered Feb 07 '13 at 18:02

I would suggest

http://www.neuroforge.co.uk/index.php/getting-started-with-python-a-opencv http://docs.opencv.org/doc/tutorials/ml/table_of_content_ml/table_of_content_ml.html http://blog.damiles.com/2008/11/the-basic-patter-recognition-and-classification-with-opencv/ https://github.com/bytefish/machinelearning-opencv

openCV is basically an image processing library but also has some amazing helper classes that you you can use for almost any task. Its machine learning module is pretty easy to use and you can go through the source to see explanation and background theory about each function.

You could also use a pure python machine learning library like: http://scikit-learn.org/stable/

But, before you feed in the data from your screen (i'm assuming thats in pixels?) to your ANN or SVM or whatever ML algorithm you choose, you need to perform "Feature Extraction" on your data. (which are the objects on the screen)

Feature Extraction can be thought of like representing the same data on the screen but with fewer numbers so i have less numbers to give to my ANN. You need to experiment with different features before you find a combination that works well for your particular scenario. a sample one could look something like this:

[x1,y1,x2,y2...,col]

This is basically a list of edge points that represent the area your object is in. a sort of ROI (Region of Interest) and perform egde detection, color detection and also extract any other relevant characteristics. The important thing is that now all your objects, their shape/color information is represented by a number of these lists, one for each object detected.

This is the data that can be provided as input to the neural network. but you'll have to define some meaningfull output parameters depending on your specific problem statements before you can train/test your system of course.

Hope this helps.

ok, i am not really proficient with ANN, so feature extraction is a thing i wanted to avoid by using pixels :). But seems like i cannot go around it and have to write some descriptors for my data. Is there anything readymade i could use? — tarrasch, Feb 08 '13 at 07:56
http://stackoverflow.com/questions/10799625/does-anyone-have-any-examples-of-using-opencv-with-python-for-descriptor-extract, http://stackoverflow.com/questions/9131552/feature-detection-in-opencv-python-bindings or http://stackoverflow.com/questions/6722736/opencv-python-and-sift-features might be of help? — Wingston Sharon, Feb 10 '13 at 08:25

score 0 · Answer 2 · answered Feb 07 '13 at 16:32

This is not entirely correct.

A 3-layer feedforward MLP can theoretically replicate any CONTINUOUS function.

If there are discontinuities, then you need a 4th layer.

Since you are dealing with pixelated screens and such, you probably would need to consider a fourth layer.

Finally, if you are looking at circular shapes, etc., than a radial basis function (RBF) network may be more suitable.

Can a neural network recognize a screen and replicate a finite set of actions?

2 Answers2