1

In short I'm doing the same thing that an optical mouse does.
I'm taking two 2D-arrays of gray-scale and right now am comparing equal values to see what the difference is.

Example:
Array1:
1 1 0 0
0 1 0 0
0 0 0 0
0 0 0 0

Array2:
0 0 0 0
0 1 1 0
0 0 1 0
0 0 0 0

Here is the code I have right now to test it. I'm only checking for 1's right now as if it were the actual image. Changing that isn't hard.

int[][] t1 = new int[][]{
                {1,1,0,0},
                {0,1,0,0},
                {0,0,0,0},
                {0,0,0,0}
        };
        int[][] t2 = new int[][]{
                {0,0,0,0},
                {0,1,1,0},
                {0,0,1,0},
                {0,0,0,0}   
        };
        double mag = 0.0;
        double angle = 0.0;
        int num = 0;
        for (int i = 0; i < t2.length; i++){
            for (int j = 0; j < t2[i].length; j++){
                if(t2[i][j] == 0) continue;
                //scan through and calculate average magnitude/angle
                if(t2[i][j] == 1){
                    for (int k = 0; k < t1.length; k++){
                        for (int l = 0; l < t1[k].length; l++){
                            if(t1[k][l] == 1){
                                mag += calculateMagnitude(l, k, j, i);
                                angle -= calculateAngle(l, k, j, i);
                                num++;
                            }
                        }
                    }
                }
            }
        }
        double fMag = mag/num;
        double fAngle = angle/num;
        System.out.println(fMag);
        System.out.println(fAngle);
public static double calculateAngle(int x1, int y1, int x2, int y2){
    if(y2 == y1){
        if(x2 > x1) return 90.0;
        else if(x2 < x1) return -90.0;
        else return 0.0;
    } else if(x2 == x1){
        if(y2 > y1) return 0.0;
        else if(y2 < y1) return -180.0;
    }
    return Math.toDegrees(Math.atan( ((double)(y2-y1))/(x2-x1) ));
}

public static double calculateMagnitude(int x1, int y1, int x2, int y2){
    double d1 = Math.pow((x2 - x1),2);
    double d2 = Math.pow((y2 - y1), 2);
    return Math.sqrt(d1 + d2);
}  

However this is quite taxing as it's O(n^4) and I'm sure there are more efficient ways to do this. I've done quite a bit of research but as of now have not been able to figure out how to do it. Also right now the exact answer should be 1.414 and -45 which means I'm off by roughly 6%. This is okay, but I'd like to be more exact.

If anyone knows a way or can figure out a way to do this more efficiently and/or precisely please post. Not to sound like an ass, but linking me to a PhD research paper and saying it should work isn't what I'm looking for. I've done a fair amount of research and those papers mostly refer if the image is still displayed completely on the screen.

I'm looking for a way to calculate the image displacement even if a portion of the image goes of screen.

Chrispresso
  • 3,660
  • 2
  • 19
  • 31

2 Answers2

1

It seems that you have a simple registration problem, i'm pretty sure there are simpler ways to solve your problem but the fastest (in terms of implementation time) is just to use something like SIFT, if you don't have a problem with using 3rd parties you can use something from this list Implementing SIFT in Java

Sift will find similar patches in both images and from there it will be pretty easy to calculate the translation of the images.

Community
  • 1
  • 1
OopsUser
  • 4,642
  • 7
  • 46
  • 71
0

This answer isn't too specific but is too long to fit in a comment:

The appropriate method depends on your input and the scenario. It's not clear to me if you're trying to see how a particular point in the image moves, or if you're trying to align the whole image to the next frame. Different parts of your question suggest one or the other to me.

Are you able to add examples of the image/frames you want to use? How are they captured? This will help a lot. I'm not sure if you're trying to align complex photographs or basic screenshots with a mouse cursor or something else entirely. If you can be as specific as possible here, people can hopefully help you with the technique without just simply linking you to a research paper.

If you are trying to find where a specific part of one image is in the next frame, you should look up "template matching".

If you are trying to align one whole image to the next and you know that this is given by a simple translation of the image, then you should look up things like image alignment, image registration along with the term "coarse to fine".

A technique that operates "coarse to fine" generally works as follows: You start with a small, resized version of both images, find the displacement there, then scale up and find the displacement at the next scale with the coarse solution as an initial guess (and you search close to this initial guess) and repeat until you are at the full resolution. The aim is to speed things up and avoid the solution being trapped in local minima.

If have two frames with lots of complex motion then you want to look up "optical flow" which aims to find a displacement per-pixel.

YXD
  • 31,741
  • 15
  • 75
  • 115
  • The way I'm thinking about it is as if you take an image, let's say a circle, and you move it an arbitrary magnitude and angle away. This could result in only one-quarter of the circle to be left on the screen. So I have to compare the second image to the first and see where it appears. From the ones you mentioned it sounds like optical flow to me – Chrispresso Dec 10 '13 at 00:18
  • So you're looking at artificial images, not photographs? Where are you getting the input from? Sounds more like template matching. – YXD Dec 10 '13 at 00:43
  • It's going to be input from the camera of a smart phone. It's just going to be gray scale. So I'm going to be comparing the gray scale value of the image – Chrispresso Dec 10 '13 at 04:41
  • You should bear in mind that the gray scale values will pretty much never be identical between frames. What will kind of thing the phone camera be recording? Can you please show example input images? What do you need to do with the translation/optical flow once you have calculated it? For a general scene with a moving camera you cannot just calculate a single translation for the image - you will need an optical flow algorithm - but in restricted scenarios (e.g. the scene just contains a planar surface) you can use faster methods. – YXD Dec 10 '13 at 09:58
  • I wish I could provide a picture. It's a two person project and the other person doesn't have that part fully working yet. It's going to be on a planar surface like a table. It's going to act like an optical mouse, so detecting where it's moving. I know it's not going to be the exact gray scale image, so I figured I'd give a margin of error for it. – Chrispresso Dec 10 '13 at 16:23