8

I am attempting to create a program that can find human figures in video of game play of call of duty. I have compiled a list of ~2200 separate images from this video that either contain a human figure or do not. I have then attempted to train a neural network to tell the difference between the two sets of images.

Then, I divide each video frame up into a couple hundred gridded rectangles and I check each with my ANN. The rectangles are overlapping to attempt to capture figures that are between grid rects, but this doesn't seem to work well. So I have a few questions:

  1. Are neural networks the way to go? I have read that they are very fast compared to other machine learning algorithms, and eventually I plan to use this with real time video and speed is very important.

  2. What is the best way to search for the figures in the image frame to test on the ANN? I feel like the way I do it isn't very good. It's definitely not very fast or accurate. It takes about a second per frame of an image 960 x 540 and has poor accuracy.

  3. Another problem I have had is the best way to build the feature vector to use as the input to the ANN. Currently, I just scale all input images down to25 x 50 pixels and create a feature vector containing the intensity of every pixel. This is a very large vector (1250 floats). What are better ways to build a feature vectors?

For a more detailed explanation of what I do here: CodAI: Computer Vision

EDIT: I would like a little more detail. What is the best way to calculates features. I need to be able to recognize a human figure in many different positions. Do I need to create separate classifiers for recognizing the difference between upright, crouched, and prone?

Nick Banks
  • 4,298
  • 5
  • 39
  • 65
  • 1
    I find this question too specific for this forum. – karlphillip Jan 24 '11 at 15:28
  • How can a question be too specific? I want to get some ideas on how to go about tracking human figures? – Nick Banks Jan 24 '11 at 17:49
  • 1
    I have a faint impression that the Kinectic folks at your company do this kind of stuff for a living. Please, let us know how it all went. – karlphillip Jan 24 '11 at 18:46
  • Yeah, well, their stuff would be a bit different. One, they don't have a moving camera position. Two, they have two cameras. Three, they are looking at real people, not rendered human characters. – Nick Banks Jan 24 '11 at 18:58
  • 1
    Great, please change the title of your question from *human figures* to *COD4 player models* since it's more accurate. – karlphillip Jan 24 '11 at 19:10

4 Answers4

7

This problem is too hard for a normal ANN.

ANNs aren't really very well suited to images with lots of spatial transformations (i.e. human figures in different positions). They effectively need to learn each possible position independently, since they can't generalise well over translations, rotations and scaling etc. Even if you managed to make it work, you'd probably need billions of training images and years of training time.

Your best bet is probably to go with either:

mikera
  • 105,238
  • 25
  • 256
  • 415
  • How do I go about building and training a Haar with OpenCV? – Nick Banks Jan 20 '11 at 19:54
  • @gamernb There's a tool that comes with OpenCV called opencv_haartraining, you could use that. But be aware that training can take a long time - and I mean days, not hours. – carnieri Jan 20 '11 at 20:26
  • It is said that boosted Haar-like features work well with rigid objects that face the camera alway in approximately the same direction. I'm not sure that applies to your case, I guess it depends on the pose of the human figures. I think it's a better idea to start with something that gives you faster feedback, so you can explore ideas easily. – carnieri Jan 21 '11 at 04:15
  • @carnieri Then what would be the best way to go about recognizing the human form (I will start out with just the upright form, but eventually need crouched and prone) in any movement position, i.e. standing towards me, away from me, middle of running... ? – Nick Banks Jan 21 '11 at 18:54
7
  • Using the raw intensities as the feature vector is not going to work1. There is too much variation induced by lighting etc.
  • A good feature to look at as a first step would be HOG. opencv 2.2 has a GPU (cuda) version of a detector it that is fast.
  • Neural networks are maybe not the best way to go. Usually you'd use a SVM or boosting as a classifier2. It's not that neural networks are not powerful enough, it's that it's hard to get the training/parameters right. Too often you get stuck in local minima etc.
  • For prone/crouched/standing figures, you definitely want different classifiers and employ them in a mixture model.
  • You asked for a "best way" - human detection is, by far, not a solved problem, so noone knows the best way. The things mentioned above are known to work pretty good.
  • If you want a good result, you definitely want to exploit that your target is specific - so, exploit that you are trying to detect humans in call of duty. The range of positions that you need to check is not the whole image, the figures will be near the ground. This allows you to speed up the search and reduce false detections. If you can, reduce the detail on the rendering - less detail means less variation, which means an easier learning problem.

Footnotes:
1 For the nitpickers: Without a highly complex classifier.
2 You can also employ a cascade of boosted classifiers to gain speed without giving away too much in detection rate.

etarion
  • 16,935
  • 4
  • 43
  • 66
1

Better features win over better learning algorithms. The basic principle in feature selection is that the best features maximize interclass variance and minimize intraclass variance. In your case, the features should emphasize the difference between images that contain a human figure and images that don't, and deemphasize the differences between images of the same class.

For instance, you could try and find the contour of the human figure, and calculate features based on the contour. OpenCV already has some functions for calculating features of contours: Moments, GetCentralMoment, NormalizedCentralMoment etc. The question then would be: how to segment human figures from the background, so that their contour can be found? There are several ways to approach this problem, such as by using texture segmentation.

Once you can solve the segmentation problem and calculate reasonable features, the choice of learning algorithm is not really that important. But why not try several and see what works best? Take a look at the Machine Learning section in the OpenCV docs.

carnieri
  • 1,353
  • 6
  • 11
0

It's not crystal clear to me what you are trying to accomplish, but it seems that you are trying to do real-time player tracking (or something similar) using the wrong approach. Human tracking is something that one would expect to be done through digital image/video processing of pictures of real human beings.

Depending on your purpose, player tracking is something that should not be done through image processing because it can be very demanding on the CPU. Tracking player models inside a game is a practice usually used for cheating applications, and it requires one to either inject code on the game process, or be the middle man between the game engine and the graphics driver. Since the game client always knows where the other players are (even if you cannot see them), one could search the process memory for the X,Y,Z coordinates of the players, or intercept graphics rendering calls searching for the location where a player model will be rendered on the screen (which can be a little tricky, since it requires a basic understanding of OpenGL/DirectX and debugging skills).

I'm not sure if its OK to detail such techniques on StackOverflow, but I will say that this topic has been largely discussed on several reverse engineer/cheating forums like GameDeception.

karlphillip
  • 92,053
  • 36
  • 243
  • 426
  • The whole reason that I am using image processing is so that it would not be considered cheating. My goal is to create an AI that can play the game exactly as a human would. See the screen and control the controller. – Nick Banks Jan 24 '11 at 16:19
  • @gamernb Even if you build a robot to sit on your chair and play for you, it would still be considered cheating. Whatever you do to automate tasks that a player should do manually, it's cheating! AI decision making based on screenshot processing would not be very efficient on a (dynamic) FPS game, even if you use the GPU to do the image processing for you (supposing that you succeed implementing the-all-mighty-algorithm-everyone-on-earth-would-like-to-know). – karlphillip Jan 24 '11 at 17:45
  • @gamernb I've implemented a few *applications* that would automate game tasks based on screenshot processing in the past. Several anti-cheat systems do what they can to prevent you from taking screenshots of the game automatically by blocking certain Win32 API calls. Anyway, this book discusses several issues I pointed out, including a great discussion of what is considered game cheating: http://www.amazon.com/Exploiting-Online-Games-Massively-Distributed/dp/0132271915 – karlphillip Jan 24 '11 at 17:53
  • I plan on using this on the xbox, not PC. I will be taking hdmi output from the xbox and feeding the input into a computer. So this will remove the cpu drain from the actual game play. The whole point of this is just an exercise to see if it is possible. I don't plan on taking advantage of it. Its just a project to give me something to do. – Nick Banks Jan 24 '11 at 18:12
  • @gamernb Ok, I hope you enjoy it. But your current solution is not going to be very practical. – karlphillip Jan 24 '11 at 18:14
  • I figured as much, thats why I am trying to find a better approach. One of the big things is, once I create a classifier, how exactly do I find sections of a frame to test on? My current approach of break everything down and test it all doesn't seem to be very accuracte. – Nick Banks Jan 24 '11 at 18:20