0

Problem: I am working with presentation where I convert slide in to image. Now I have to check for duplicate slide image that is in same presentation or new upload presentation.

What I did: I save image in SQL Server Database with base 64 format and compare this images with new uploaded file.

Issue: when I uploaded same presentation at that time base 64 work properly, but if I check with different presentation with same slide it is not get same base 64 string. I've tried with byte[], but still get same issue.

Also I want to know that what is the bast format(base64,Byte[], or etc..) for image to store and check in SQL.

Thank you in Advance.

jarlh
  • 42,561
  • 8
  • 45
  • 63
  • `if i check with diffrent presentation with same slide it is not get same base 64 string` because they *are* different. BASE64 still represents the same bytes. You can't use a simple equality comparison to find if two *different* images are close enough. You need image processing algorithms. The only thing BASE64 does here is increase space usage. You could save the bytes in a `varbinary(max)` field instead but what you want is still not possible with any relational database – Panagiotis Kanavos Oct 20 '21 at 07:58
  • If you mean same, you mean ABSOLUTELY IDENTICAL (then this is basically a comparison of bytes) or similar (i.e. maybe a pixel difference, different software version decoding)? The later - is a WAY more complex problem and NOT solvable in SQL (as you need to compare the bitmaps, which is not possible in SQL). – TomTom Oct 20 '21 at 07:59
  • 1
    `what is the bast format` images are binary data, not text. Their formats are JPG, PNG, GIF, etc. Since all of them are binary, the only relevant type is `varbinary(max)`. If you use a lossless format like PNG, then *maybe* you'll get the same bytes from the same slide every time. If you use a lossy format like JPG the same slide may result in different bytes every time you save it. – Panagiotis Kanavos Oct 20 '21 at 07:59
  • In SQL Server 2017 and later you can use Python packages through SQL, which means you can also use image processing and AI packages to check images for similarity. – Panagiotis Kanavos Oct 20 '21 at 08:02
  • Which dbms are you using? – jarlh Oct 20 '21 at 08:17
  • @jarlh MS SQL Relational DB – Umang Pandya Oct 20 '21 at 08:52
  • @PanagiotisKanavos PNG can produce a number of different byte sequences for the same image, depending on encoding settings (filter selection, compression level, etc). So even a lossless format isn't guaranteed to produce the same bytes for the same input. – Corey Oct 20 '21 at 09:42
  • @Corey indeed. And even if all parameters are identical, the slightest change in the source will produce completely different bytes. – Panagiotis Kanavos Oct 20 '21 at 09:48

1 Answers1

4

If you want exact image comparison it should simply be an issue of creating hashes using your favorite hash algorithm and comparing them. Either manually, or letting the database do its thing.

If you want to do approximate image comparison you will need to use a image hash. Such algorithms are available in Open CV and can be used in c# by with the emgu cv wrapper, or several other platforms. You might need to either do the hashing yourself or using python packages in SQL server as Panagiotis Kanavos mentions.

Images would typically be stored as binary blobs of compressed image files, but storing raw image data is also possible. But I would consider if storing images directly in the database is the best approach, in many cases using the file system is preferred. See storing images in databases for the full discussion.

JonasH
  • 28,608
  • 2
  • 10
  • 23