Comparing two scanned documents and getting percentage of difference in C#

Question

I have been trying to create a program that scans documents to ensure that they have not been already pre-scanned into the system, there'll be a folder where all the previously scanned documents will reside, however I'm having difficult finding a way to make it identify the differences properly between documents, the only thing that it can accurately do is tell whether they're 100% the same or not, and that might even be subject to error in-case the lighting conditions were to differ, so far i have only tested it with 2 identical images of the same document, so they're basically just copies of each other, but each time i scan a different document or picture, all the white that is in similar places is counted towards the overall similarity, which is something i don't want, how could i make it exclude all the white pixels and only compare the black pixels ? color isn't a main problem because all the documents will be in black and white.. here is the code that i have at the moment.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace ImageCompare {
    class TestingClass2
    {

        public Double Compare(Image img1_, Image img2_)
        {
            //total number of pixels
            int pixelNb = img1_.Width * img1_.Height;

            Double percent = 100;

            Bitmap resized_img2_ = ResizeBitmap((Bitmap)img2_, img1_.Width, img1_.Height);

            for (int i = 0; i < img1_.Width; i++)
            {
                for (int j = 0; j < img1_.Height; j++)
                {
                    percent -= ColorCompare(((Bitmap)img1_).GetPixel(i, j),
                    ((Bitmap)resized_img2_).GetPixel(i, j)) / pixelNb;
                }
            }

            return percent;
        }

        public Bitmap ResizeBitmap(Bitmap b, int nWidth, int nHeight)
        {
            Bitmap result = new Bitmap(nWidth, nHeight);
            using (Graphics g = Graphics.FromImage((Image)result))
                g.DrawImage(b, 0, 0, nWidth, nHeight);
            return result;
        }

        public Double ColorCompare(Color c1, Color c2)
        {
            return Double.Parse((Math.Abs(c1.B - c2.B) + Math.Abs(c1.R - c2.R) + Math.Abs(c1.G - c2.G)).ToString()) * 100 / (3 * 255);
        }

        public void showresult(Bitmap img1, Bitmap img2)
        {


            String degreeofSimilarity = "";

            Double simcheck;

            simcheck = Compare(img1, img2);

            String var = "";

            if (simcheck == 100)
            {
                var = "Identical";
            }
            else if (simcheck >= 95)
            {
                var = "Almost Identical";

            }else if (simcheck <= 85)
            {

                var = "Very Similar";

            }else if (simcheck <= 50)
            {

                var = "Slightly Similar";

            }else if (simcheck <= 30)
            {

                var = "Not Similar";

            }else
            {

                var = "UNKNOWN ERROR";

            }

            degreeofSimilarity = var;

            MessageBox.Show("Comparison Result is: " + simcheck + " // - The images are " + degreeofSimilarity);

        }

    }

}

Looking forward to hearing your helpful answers, if you're confused about any part please feel free to ask me down below, aside from that, the code above is only the comparison part.

Not a duplicate but i've had something simmilar here, hope this helps https://stackoverflow.com/questions/35151067/algorithm-to-compare-two-images-in-c-sharp/35153895#35153895 — fubo, Feb 13 '18 at 10:45
Scanned documents: so you are comparing images. What does a "*n%* difference" mean? — Richard, Feb 13 '18 at 10:45
I doubt you will get useful results from such a simple comparison; I would expect all pixels to be different both in color and in placement. I think you will need to either a lot more sophisticated preparations or a few clever tricks. One that comes to mind would be to include dots at a defined position in each document to help aligning them. Or finding markers that would allow some kind of fingerprint/hash.. - Ignoring bright pixels should be trivial. — TaW, Feb 13 '18 at 10:45
if you're talking about the same doc scanned *separately* twice - something as simple as a millimetre offset on the scanner (at 300 dpi, 1mm==12 pixels) or a slight rotation is going to make this approach *very* unreliable. It might work **very occasionally**, but... frankly I wonder whether some kind of machine learning approach might be better here; you could use image manipulation tools (offset, scale, lighting, etc) to generate a corpus of training images to simulate both positive and negative results. — Marc Gravell, Feb 13 '18 at 10:47
Duplicate-Detection is all but a trivial task. I suggest you find a library that does exactly this. It may well be closed-source _and_ expensive though. — Fildor, Feb 13 '18 at 11:20
I see, thank you all for the replies, It has been a pain trying to find a way to make this work 100%, and if anyone's got any library suggestions for some decent free libraries that would be great, otherwise it's alright.. — Khalid Idris, Feb 13 '18 at 11:26
Invert the brightness matrix to work with the non-white only. Search for patterns and create some scanlines as a reference for comparison. [OpenCV](https://opencv.org/) can really help you here. See these CodeProject articles: [Contour Analysis for Image Recognition](https://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C), [Image Recognition with Neural Networks](https://www.codeproject.com/Articles/19323/Image-Recognition-with-Neural-Networks). — Jimi, Feb 14 '18 at 05:13

Comparing two scanned documents and getting percentage of difference in C#

0 Answers0