Algorithm fast compare images \ matrix

Question

There is one image (imA) size 10x10px and more 60 000 images (imN) 10x10

All images are black and white

The task of finding the minimum number of points with which to distinguish the first image (imA) from all others (imN) - sorry for my bad English, I add img and comment

The first thing I did, it turned all the images in a matrix with numpy

q=0
for file in inputImages:
    eachImage = os.path.join(generatorFolder, file)
    a[q]=numpy.asarray(Image.open(eachImage))
    q+=1

b=numpy.asarray(Image.open(templateimage))

b[y,x,color] color its list [255,255,255]

a[1-60000,y,x,color]

Next I use a nested comparison, non-recursive search with depth in 3-point looks something like this:

for y1 in range(b.shape[0]):
    for x1 in range(b.shape[1]):
        for y2 in range(b.shape[0]):
            for x2 in range(b.shape[1]):
                for y3 in range(b.shape[0]):
                    for x3 in range(b.shape[1]):
                        if y1==y2==y3 and x1==x2==x3:continue

                        check=0
                        for a_el in range(a.shape[0]):
                            if numpy.array_equal(b[y1,x1],a[a_el,y1,x1]) and \
                               numpy.array_equal(b[y2,x2],a[a_el,y2,x2]) and \
                               numpy.array_equal(b[y3,x3],a[a_el,y3,x3]):
                                check=1
                                break

                        if not check:return 'its unic dots'

The problem with this code is that it is very slow. For instance, we first image is different from all others at least five points:

get 100! / 95! * 60 000 comparisons - 542,070,144,000,000

True, I use a slightly different algorithm, which allows you to turn this into: 40!/35!*60000 = 4.737.657.600.000 that not too little.

Is there a way to solve my problem more beautiful, and not brute force.

UPDATE add img

enter image description here

0 line: 3 other image (imN) 4x4

1 line: 0 template image (imA) and 1-3 image where the red marked difference (imA XOR imN)

2 line: 0 image where the blue marked two dots two points for comparison,

    1 image green its difference, red its compare - difference yes - NEXT

    2 image red its compare - difference NO - Break (these two points is not enough to say that imA differs from imN(2))

3 line: like line 2 the other dots

4 line: We chose two dots is enough to say that imA differs from imN(1-3)

Second line of images: some pixels which are in imA but not in imB are marked but others are not marked. why? — Winston Ewert, Sep 10 '11 at 01:43
Those two pixels are not sufficient to differentiate between imA and im1 or im2, as far as I can see. Those would be (0,1) and (2,1), which are not both set in any of the three combinations. — Benjamin, Sep 10 '11 at 03:00

score 1 · Answer 1 · answered Sep 09 '11 at 11:21

If I understand your question correctly, you need to calculate the number of points on the first picture, which is different from all the other pictures, irrespective of how the other pictures differ from each other?

If this is case, unless I'm missing something, can you not simply do something like the following:

boolean[10][10] DIFFS // all values set to TRUE
int[10][10] ORIGINAL  // store first pictures color values

foreach IMAGE in [IMAGES - FIRST IMAGE] {
    int[10][10] CURRENT <- IMAGE // store the current image's color values
    for (i : 0 -> 9) {
        for (j : 0 -> 9) {
            if (DIFFS[i][j]) {
                DIFFS[i][j] = ORIGINAL[i][j] != CURRENT[i][j]
            }
        }
    }
}

Then you are left with a 2-dimensional matrix DIFFS where each position indicates if the corresponding pixel in the original image differs from all the other pictures.

If I understand the syntax. Then your method gives a matrix of points that are unique to the first image. But in my case there are none. I added a picture and comment. — Echeg, Sep 09 '11 at 22:12

score 0 · Answer 2 · answered Sep 10 '11 at 04:07

My approach would be:

Read images into a 60,000 by 100 array, set as 1's and 0's.
Sum them on a per pixel basis to get a count of how many images each pixel is "set" to 1 in
Pick the pixel in your reference image which has the lowest sum. (If the sum is 0, then only that pixel is required to distinguish between the reference image and all the others)
Now looking only at images with that bit set, re-calculate the sums and pick the lowest again.
Repeat iteratively until either all the set bits in the reference image are selected (which means that either it is not possible to distinguish it or that all bits are required) or until the sum is equal to 1 which means only one image has that bit set.

In code, for 1000 4x4 images:

import numpy

def least_freq_set_pixel(array, must_have):
    # If it is specified that a certain pixels must be set, remove rows that don't have those pixels set
    if must_have != []:
        for i in must_have:
            array = numpy.delete(array, numpy.where(array[:,i] == 0), axis = 0)

    # Calculate the sum for each pixel
    set_in_n = numpy.sum(array, axis = 0)
    my_index = numpy.argmin(set_in_n)

    # Return the pixel number which is set in the fewest images
    return my_index, set_in_n[my_index]


# Create some test data 4x4 images
numpy.random.seed(11)
a = numpy.array([0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0])
b = numpy.random.randint(0,2,(1000,16))


must_have = []
stop = 0
while stop == 0:
    i,j = least_freq_set_pixel(b, must_have)
    print i,j
    # If the pixel is set in more than one image and not all pixels have been selected yet... find the next pixel
    if j > 1 and len(must_have) <= 16:
        must_have.append(i)
    else:
        stop = 1
        print must_have

which tells us that we need 7 pixels of the 16 to separate the reference image from the rest, pixels 0,1,2,4,5,10 and 15.

I copied your code and added check. [link]http://codepaste.ru/7617/ If I understand correctly, then your version does not work? — Echeg, Sep 11 '11 at 05:21
Does not quite work indeed. There is a problem with the condition checking when the sum is 0, which is more likely with small tests sets. I don't have time to fix it, but it illustrates the approach. — Benjamin, Sep 11 '11 at 13:36

score -1 · Answer 3 · answered Sep 09 '11 at 10:20

10x10 = 100. 100 comparisons between two images. you have 60000 images. I think that algorithm have to be O(100 * 60000) = O(6000000). I don't know python, but pseudo-algorithm should be like that:

int minDistinguishPoints = 100; 
int currentPointsDiff;
Image imA;

foreach (Image myImg in ImageItems)
{
    currentPointsDiff = 0;
    for (int i=0; i<10; i++)
       for (int j=0; j<10; j++)
       {
           if (!imA.GetPixel(i,j).Equals(myImg.GetPixel(i,j)))
           {
               currentPointsDiff++;
           }                
       }
    if (minDistinguishPoints > currentPointsDiff)
    {
        minDistinguishPoints = currentPointsDiff;
    }
}

May be I don't understand question. If so, explain some more detail, please.

@stukselbax: `O(6000000)` is silly because it is equal to `O(1)`. You need to parametirize the input, say with `n` (# of images), `w` (width of images) and `h` (height of images). Than the complexity is `O(n * w * h)`. Like this we can compare algorithms. — orlp, Sep 09 '11 at 22:09

knitti · Answer 4 · 2011-09-10T01:20:01.390

-1

If I understand correctly, we can completely redefine your problem. What I think you want to achieve: quickly identify, whether a certain given image equals one out predefined 60000 or none of these. Every image is 10x10 black/white.

Sooo every image can be interpreted as a 10x10=100 bit array, and you have 60000 predetermined given, against which you want to compare.

Why don't you just convert your 60000 images into 100bit-integers, and sort these. Then you can quite efficiently compare any 100bit-integer and find a hit or miss.

EDIT: if I understand the comment correctly, the images are much larger. As long as the known ones are still a manageable amount (60k is, and 600k is probably also), you could generate hashes for these images and sort and compare these. You have some up-front computing cost, but you have it only once.

edited Sep 10 '11 at 01:20

answered Sep 09 '11 at 22:54

knitti

6,817
31
42

In fact much more complicated than I described. imA not one of a few hundred. sequences of images is imN 100k * 100k imN. And I find the minimum number of points to speed up the following items of manipulation. According to my calculations it will give me the acceleration is about 20 times – Echeg Sep 10 '11 at 01:00
imA doesn't need to be of a few hundred, it could be any possible. But images as large as 100kx100k is entirely different than what you described. Throw a hashing algorithm on it. – knitti Sep 10 '11 at 01:09

Algorithm fast compare images \ matrix

4 Answers4

Linked