6

I am working in MATLAB

PLots

NOTE : Here, the data plotted is the track of x - position of the pixel at position (i,j) of the FIRST frame throughout all the frames. It means that the pixel at (23,87) in the first frame has, at the end of the sequence, x-position as 35 (as visible in the plot).

Here is some typical plots of x_pos for some different values of (i,j) . (i,j) refers to a pixel at (i,j) in the first frame not throughout all frames

For (i,j) = (23 ,87)

(i,j) = (42 ,56)

(i,j) = (67 ,19) enter image description here

Nishant
  • 2,571
  • 1
  • 17
  • 29
  • Can you tell us more about what these pixels represent? For me your graphs look like some complicated function without good predictable structure, like moves of the fly in a room. That is, as a human I can't predict where particular pixel will go further, only approximate direction. So first of all I'd want to know if it is predictable from training data at all or random error will be too large anyway. – ffriend Jun 05 '14 at 22:17
  • 1
    @ffriend The data is predictable , please see the input frames in the EDIT. The input frames are motion of an object for a short duration , so it is highly likely that the object is moving on a particular path. The graph looks a bit complicated due to the motion of camera along with the object motion – Nishant Jun 06 '14 at 09:21
  • @Nishant I deleted my answer, I think it's better this way for your question/bounty. The summary of my (deleted) answer is just that to predict the value of a specific pixel your algorithm need look at the values (in the preceding frames) of the pixels around the one you are trying to predict. You really need to try to track the direction in which each object in the frame is moving. – Max Jun 06 '14 at 09:43
  • @Nishant: I believe confusion partially comes from the question itself. It's not actually pixel that moves in frames, but instead some real-life object. Pixels are still and bound to specific frame, and it is our mind that links frames together and recognizes moving object on them. I believe the whole question may be simplified if you emphasize object trajectory part and not pixel part. – ffriend Jun 06 '14 at 09:48
  • 1
    @Max I am exactly doing what you have suggested. – Nishant Jun 06 '14 at 10:46
  • @Nishant ok, I undeleted my answer... :-) – Max Jun 06 '14 at 12:02
  • @Max You comment here and your answer have opposite meanings. You comment here suggests me to do something which I am actually doing,"I am actually tracking the direction in which each object in the frame is moving", whereas your answer says straight away that I cannot solve my **Problem**. So in my Humble opinion you answer is improper and incomplete. – Nishant Jun 06 '14 at 12:09
  • @Nishant I just edited the answer, if this doesn't help drop me another comment and I'll re-delete it – Max Jun 06 '14 at 12:13
  • @Nishant I would also suggest to use a better title. "Training algorithm to train this specific Data" tell essentially nothing about the real problem, maybe a title like "Best algorithm to predict pixel movement vector in a video" could attract more viewer (I think a question with a 150 bounty should be viewed more than 65 times...) – Max Jun 06 '14 at 12:25
  • @thanks for the suggetion, I will do that now – Nishant Jun 06 '14 at 12:49

2 Answers2

2

A video is like a sequence of photos of real objects.
And real object, in front of a camera, can do only 2 different things:

  1. they stand still
  2. they move

If the pixel you are trying to predict are from a video, then you need to look ad how pixel are moving on screen, because object are moving on screen.

And this is how video codec compression works (H264, H265...) (clearly video compression algorithm are much more complex that just try to understand the direction of a pixel... :-) )

Here is some question/answer on stackoverflow that may help you:

Community
  • 1
  • 1
Max
  • 7,408
  • 2
  • 26
  • 32
  • TO be precise `For each pixel of the first frame , I have 92 values of where they have been in the next frames` so they are related . Please see the EDIT. – Nishant Jun 06 '14 at 06:19
  • Please edit your answer so that it doesn't confuses others viewing this question. What you are saying is in contradiction with the **problem** part of my question. – Nishant Jun 06 '14 at 06:28
  • 1
    @Nishant I edited the answer, do you believe that the rest of the answer still apply? or did I totally misunderstood your problem? What I'm saying is that to predict the value of a specific pixel your algorithm need look at the values (in the preceding fames) of the pixels around the one you are trying to predict. – Max Jun 06 '14 at 07:52
  • No , It still doesn't explains my question. I clearly understand what you are saying. You mean that the intensity at a particular position throughout all the frames is not related. And it is true. But I don't have that. I have the position of a particular pixel of the **FIRST** frame throughout all the frames. Please read the question again and refer to the **NOTE** of **PLOT** section. Then edit your answer accordingly as it has already confused those who have up voted it. – Nishant Jun 06 '14 at 09:20
  • Thanks for the additional links , I have already calculated optical plow and have an object track, in a quick look I see that all these methods are for tracking object thus making it difficult to predict if the object size decreases, Nevertheless I will be looking into these n detail – Nishant Jun 06 '14 at 16:45
  • @Nishant I'm not expert about object tracking... but maybe it's possible to: find a moving object, then try to track it in 3D by looking at it's size change... but it's complex, a size change can be due to a distance change, or due to a rotation of the object (unless an object is a sphere...) – Max Jun 06 '14 at 18:45
2

So it's not about pixels in the image, but more about moving object, which makes the task much more tractable. Your data is indeed time series, thus time-aware algorithms are preferable. Markov models (in particular Markov chains and a bit more sophisticated Hidden Markov models) are classic examples of them.

However, your input is noisy because of camera instability. Thus, even better solution would be to use Kalman filter - model similar to HMMs, but with explicit notion of noise. It is widely used in robotics, navigation and similar areas to estimate current and predict future position of a vehicle based on inexact sensor data and historical information. Doesn't it sound similar to what you need?

I'm not big fun of Matlab, but it seems to have kalman function that implements mentioned filter.

ffriend
  • 27,562
  • 13
  • 91
  • 132
  • 1
    You are right , it is a time-series, I have tried the time-series toolbox in matlab. Can you see its algorithm and tell me if it is different than Markov models. The kalman filter sounds similar . I will check it out. – Nishant Jun 06 '14 at 10:44
  • If you are talking about [closeloop](http://www.mathworks.com/help/nnet/ref/closeloop.html) function, then no, it's not about Markov models, but instead about neural networks (probably recurrent). Recurrent NNs are also good approach, though it's sometimes hard to choose good hyperparameters for them. Anyway, you need to look for time-series based methods, not something still like normal multilayer perceptron or SVM. – ffriend Jun 06 '14 at 14:36
  • BTW, if these methods fail, then most probably camera instability noise is too high and you need to run camera stabilization algorithms first. But this is different story that deserves separate question. – ffriend Jun 06 '14 at 14:40
  • I have data on fixed camera too , but still I am no close near prediction with `mean square error` less that 2 for any of above said methods in the **What I have Tried** section of the question. – Nishant Jun 06 '14 at 15:09
  • 2 pixels from 100x100 images? Well, it seems like a very good result, actually. Anyway, I see that you have tried many different methods, but haven't _worked out_ any of them. It's not enough to just use built-in function. As I outlined above, it's very important to tune up hyperparameters (e.g. number of layers and neurons in NNs), select good features and so on. You should really pay more attention to details of specific method, not to a wide variety of different methods. – ffriend Jun 06 '14 at 15:30
  • what do mean by the question "2 pixels from 100x100 images? ". Also, I have given a fair share of my time to each method like I have tried tuning up ` number of neurons in NN` .However I am avoiding going much deeper like I haven't interfered with the initial values and weights between different layers . The reason is then the network will be trained to a specific data set and Hence will fail in a new data set. – Nishant Jun 06 '14 at 16:40
  • You said that you have minimal mean square error of 2, right? I understand that you do all the calculus in pixel coordinates, and thus you have `MSE = 1/N * sum((predicted - actual)^2) = 2`, which is average difference between actual and predicted data. If my understanding is correct, then 2 pixels difference is really good. If not, please explain in more details what you mean by saying "The least absolute mean error between predicted and actual in all these methods is 2". I'm also not getting your statement on NNs. What you mean by "new data set"? – ffriend Jun 06 '14 at 17:09
  • Yes you understand correct. But an error of 2 in pixel coordinate is not good. Reason being that the error is random so the pixel deviate in circle of radius 2 randomly thus the object structure is lost while generating the predicted images.Also, I am working with small resolution images(108x92) where 2 pixels is somewhat big number. – Nishant Jun 06 '14 at 17:21
  • By new data set, I mean that I have many pixels of the same object and , even if the camera moves smoothly , their track doesn't follows the same curve. There are usually some kinks in between the curve that make the training difficult . Also I will have to test the final result on different videos with both static and moving camera. – Nishant Jun 06 '14 at 17:25
  • So you really try to track each pixel of an object in an image sequence? Then I'm pretty sure it won't work. I'll always have some minimal noise for each pixel, and this will break object's structure anyway. Instead, you can track object's transformation, including changes in size, shape and position, which is a very different story. So let's start from the beginning: what is your task at the very high level? What are your images and what are objects on them? – ffriend Jun 06 '14 at 17:55
  • at very high level I have to predict future frames of a given image sequence which is generally continuous frames taken from a video.There are moving objects in the video . Aim is to take 50 frames and predict 10 frames. I was working on moving camera earlier ,but recently switched to static camera. The objects are generally vehicles and the video is generally traffic cam footage. – Nishant Jun 06 '14 at 18:09
  • I have never heard about predicting whole frames, and I doubt it's really possible. Predict object position and size - yes, predict the entire image - no. Even human can't do this. – ffriend Jun 06 '14 at 21:11