0

I'm trying to analyze an image-based 3digit number captcha from an online resource. The numbers do not move at all. I use BufferedImage's getSubimage(...) method to extract each number from the captcha. I have saved (0-9) for each of the ones, tens and hundreds place. (So 30 numbers in total)

I read the bytes of the online image into a byte[] and then create a BufferedImage object like this:

BufferedImage captcha = ImageIO.read(new ByteArrayInputStream(captchaBytes));

Then I compare this image to a list of images on my drive:

BufferedImage[] nums = new BufferedImage[10];
//Load images into the array here... The code is removed.
for(int i = 0; i < nums.length; i++) {
    double x;
    System.out.println(x = bufferedImagesEqualConfidence(nums[i], firstNumberImage));
    if(x > 0.98) {
        System.out.println("equal to image " + i + ".jpeg");
        isNewEntry = false;
        break;
    }
}

This is how I compare two images:

static double bufferedImagesEqualConfidence(BufferedImage img1, BufferedImage img2) {
    double difference = 0;
    int pixels = img1.getWidth() * img1.getHeight(); 
    if (img1.getWidth() == img2.getWidth() && img1.getHeight() == img2.getHeight()) {
        for (int x = 0; x < img1.getWidth(); x++) {
            for (int y = 0; y < img1.getHeight(); y++) {
                int rgbA = img1.getRGB(x, y); 
                int rgbB = img2.getRGB(x, y); 
                int redA = (rgbA >> 16) & 0xff; 
                int greenA = (rgbA >> 8) & 0xff; 
                int blueA = (rgbA) & 0xff; 
                int redB = (rgbB >> 16) & 0xff; 
                int greenB = (rgbB >> 8) & 0xff; 
                int blueB = (rgbB) & 0xff;                      
                difference += Math.abs(redA - redB); 
                difference += Math.abs(greenA - greenB); 
                difference += Math.abs(blueA - blueB); 
            }
        }
    } else {
        return 0.0;
    }

    return 1-((difference/(double)pixels) / 255.0);
}   

The image is loaded completely from a HttpURLConnection object wrapped in my own HttpGet object. And so I do: byte[] captchaBytes = hg.readAndGetBytes(); Which I know works because when I save BufferedImage captcha = ImageIO.read(new ByteArrayInputStream(captchaBytes));, it saves as a valid image on my drive.

However, even though 2 images are actually the same, the result shows they are not similar at all. BUT, when I save the image I downloaded from the online resource first, re-read it, and compare, it shows they are equal. This is what I'm doing when I say I save it and re-read it:

File temp = new File("temp.jpeg");
ImageIO.write(secondNumberImage, "jpeg", temp);
secondNumberImage = ImageIO.read(temp);

Image format: JPEG

I know this may have something to do with compression from ImageIO.write(...), but how can I make it so that I don't have to save the image?

Raghav
  • 249
  • 2
  • 12
  • Don't you have to wait for the image to load? What are you doing to ensure that it's loaded before the comparison? I'm going to guess that saving the image ensures that it's loaded first. – Mad Physicist Feb 17 '19 at 07:53
  • I should have been more clear in the question. The image is loaded completely from a `HttpURLConnection` object wrapped in my own `HttpGet` object. And so I do: `byte[] captchaBytes = hg.readAndGetBytes();` Which I know works because when I save `BufferedImage captcha = ImageIO.read(new ByteArrayInputStream(captchaBytes));`, it saves as a valid image on my drive. – Raghav Feb 17 '19 at 07:55
  • Also I know its completed because while it doesn't correctly an downloaded image in memory vs. an image loaded from the drive, if I save the image I downloaded using `ImageIO` and re-read it using `ImageIO`, that image can compare with other images on my drive correctly. – Raghav Feb 17 '19 at 07:59
  • Don't put more information into comments. Always update your question instead, – GhostCat Feb 17 '19 at 07:59
  • That is strange indeed. – Mad Physicist Feb 17 '19 at 08:00
  • Yup. I suspect its implicit compression from `ImageIO.write(...);` – Raghav Feb 17 '19 at 08:01
  • You are saving images as JPEG? No wonder that all the pixels are off by a tiny amount that is usually invisible. Use PNG instead of JPEG. – Roland Illig Feb 17 '19 at 09:47
  • The browser sends it as a `content-type: image/jpeg` itself. Anyways, I managed to fix it in my answer. – Raghav Feb 17 '19 at 09:56

1 Answers1

0

The problem was within my bufferedImagesEqualConfidence method. Simply comparing RGB was not enough. I had to compare individual R/G/B values.

My initial bufferedImagesEqualConfidence that didn't work was:

static double bufferedImagesEqualConfidence(BufferedImage img1, BufferedImage img2) {
    int similarity = 0;
    int pixels = img1.getWidth() * img1.getHeight(); 
    if (img1.getWidth() == img2.getWidth() && img1.getHeight() == img2.getHeight()) {
        for (int x = 0; x < img1.getWidth(); x++) {
            for (int y = 0; y < img1.getHeight(); y++) {
                if (img1.getRGB(x, y) == img2.getRGB(x, y)) {
                    similarity++;
                }
            }
        }
    } else {
        return 0.0;
    }

    return similarity / (double)pixels;
}

(Source: Java Compare one BufferedImage to Another)

The bufferedImagesEqualConfidence that worked is:

static double bufferedImagesEqualConfidence(BufferedImage img1, BufferedImage img2) {
    double difference = 0;
    int pixels = img1.getWidth() * img1.getHeight(); 
    if (img1.getWidth() == img2.getWidth() && img1.getHeight() == img2.getHeight()) {
        for (int x = 0; x < img1.getWidth(); x++) {
            for (int y = 0; y < img1.getHeight(); y++) {
                int rgbA = img1.getRGB(x, y); 
                int rgbB = img2.getRGB(x, y); 
                int redA = (rgbA >> 16) & 0xff; 
                int greenA = (rgbA >> 8) & 0xff; 
                int blueA = (rgbA) & 0xff; 
                int redB = (rgbB >> 16) & 0xff; 
                int greenB = (rgbB >> 8) & 0xff; 
                int blueB = (rgbB) & 0xff;                      
                difference += Math.abs(redA - redB); 
                difference += Math.abs(greenA - greenB); 
                difference += Math.abs(blueA - blueB); 
            }
        }
    } else {
        return 0.0;
    }

    return 1-((difference/(double)pixels) / 255.0);
}

(Source: Image Processing in Java)

I guess to find similarity between two images you have to compare the individual R/G/B values for each pixel rather than just the whole RGB value.

Raghav
  • 249
  • 2
  • 12