3

I have many files of a same picture in various resolution, suitable for every devices like mobile, pc, psp etc. Now I am trying to display only unique pictures in the page, but I dont know how to. I could have avoided this if I maintained a database at the first place, but I didn't. And I need your help detecting the largest unique pictures.

jwueller
  • 30,582
  • 4
  • 66
  • 70
mrN
  • 3,734
  • 15
  • 58
  • 82
  • 2
    is there not even a similarity in the names? Otherwise you would end up (provided you find an algorithm that does the trick) with a N:M Comparison Situation, maybe that can help you http://stackoverflow.com/questions/2037205/image-comparison-with-php-gd but, remember, if you have 1000 Images that would be 999999 Comparison Operations – Hannes Jan 05 '11 at 09:41
  • 2
    start maintaining a database if you didn't in the first place . if you made a mistake in the past doesn't mean you have to keep on going and warp all you're code/logic around that mistake . – Poelinca Dorin Jan 05 '11 at 09:41
  • 2
    Wow, this is going to be very expensive in terms of CPU cycles. You'll need to downsize every image A using the same algorithm that you used the first time (A is the bigger image of the two). If the downsized A and B are equivalent, store that information somehow and continue with the next pair. This could be O(N²), so you should think twice if you're dealing with a large amount of data. I think you should fix your database. No matter what. – jwueller Jan 05 '11 at 09:43
  • @poelinca, yes to start maintaining in the new database I need to extract the unique image... @hannes, there is no uniqueness in name. I have about 30,000 + pictures – mrN Jan 05 '11 at 09:50
  • 1
    @elusive, I didnt maintain any database before, but I am trying too for that i need unique and largest images.... – mrN Jan 05 '11 at 09:55
  • 2
    @mrNepal: Given 10 milliseconds for each comparison: `30000*30000*10/1000/60/60/24 = 104.17`. You'll need a bit more than a hundred days to finish this. Wohoo! – jwueller Jan 05 '11 at 09:57
  • @mrNepal so the names are totally random, or can you at least group them by their name ? – Hannes Jan 05 '11 at 10:02
  • @hannes, no..... it is totally randomm – mrN Jan 05 '11 at 10:11

4 Answers4

15

Install gd2 and lib puzzle in your server.

Lib puzzle is astonishing and easy to play with it. Check this snippet

<?php
# Compute signatures for two images
$cvec1 = puzzle_fill_cvec_from_file('img1.jpg');
$cvec2 = puzzle_fill_cvec_from_file('img2.jpg');

# Compute the distance between both signatures
$d = puzzle_vector_normalized_distance($cvec1, $cvec2);

# Are pictures similar?
if ($d < PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD) {
  echo "Pictures are looking similar\n";
} else {
  echo "Pictures are different, distance=$d\n";
}

# Compress the signatures for database storage
$compress_cvec1 = puzzle_compress_cvec($cvec1);
$compress_cvec2 = puzzle_compress_cvec($cvec2);
Xavier Barbosa
  • 3,919
  • 1
  • 20
  • 18
2

Well, even thou there are quite a few algorithms to do that, i believe it would still be faster to do that manually. Download all the images feed them into something like windows live photo gallery or any other software which could match similar images. This will take you few hours, but implementing image matching algorithm could take far more. After that you could spend extra time on amending your current system to store everything in a DB. Fix cause of the problem, not it's symptoms.

Ivan
  • 3,567
  • 17
  • 25
  • The question isn't "Why should you or shouldn't you detect similar images in PHP" - you cannot possibly know all the permutations of reasons for why people need to compare images in PHP. Saying to do it manually, is NOT a good answer to the actual question. – Dave Hilditch Apr 27 '17 at 19:01
0

Firstly, your problem has hardly anything to do with PHP, so I have removed that tag and added more relevant tags.


Smartly doing it will not require NxN comparisions. You can use lots of heuristics, but first I would like to ask you:

  1. Are all the copies of one image exact resize of each other (is there some cropping done - matching cropped images to the original could be more difficult and time consuming)?

  2. Are all images generated (resized) using the same tool?

  3. What about parameters you have used to resize? For example, are all pictures for displaying on PSP in the same resolution?

  4. What is your estimate of how many unique images you have (i.e, how many copies of each picture there might be - on an average)?

  5. Do you have any kind of categorization already done. For example, are all mobile images in separate folder (or of different resolution than the PC images)? This alone could reduce the number of comparisons a lot, even if you do brute force otherwise.

A very top level hint on why you don't need NxN comparisions: you can devise many different approximate hashes (for example, the distribution of high/low frequency jpeg coefficients) and group "potentially" similar images together. This can reduce the number of comparisions required by 10-100 times or even more depending on the quality of heuristic used and the data set. The hashing can even be done on parts of images. 30000 is not a very large number if you use right techniques.

jwueller
  • 30,582
  • 4
  • 66
  • 70
  • Yes, they have been cropped and resized, the images are of 480x272, 800x600, 1024x 768, 1280 x 1024, 1600x1200, 1600x1080, 1920 x 1080, 1920 x 1200, 2560 x 1600. There should be about 3500 unique images and about 9 exact copies are made of each picture. About categorisation... I have placed these images in about 30 folders each containing 1000 pictures each, upon exceeding the size limit a folder will be created and placed then, file name is random to bring different images. – mrN Jan 06 '11 at 11:38
  • Then I randombly select about 20 images from each folder and create a file list. Then, I run them through a function where the images are sorted according to size and automatically copied to my respective device if they exist.. – mrN Jan 06 '11 at 11:39
  • Cropping algorithm is automatic upon upload..... I resize to the height then the image is centered to the specific resolution, I provide a image size of 2560 x 1600 normally. If in case I uploaded bigger image first a 2560 x 1600 image will be taken and the source image will be discarded. – mrN Jan 06 '11 at 11:42
  • Can't I create a script to compare images like CBIR softwares like iMatch or others? – mrN Jan 06 '11 at 11:43
  • The reason for me tag `php` because the images are online and I use a local website to manage them and all the website is developed in php and i want php code to isolate other files and only show the biggest images. – mrN Jan 06 '11 at 11:45
  • IMO, a language like C++ or a tool like imagemagick would be much faster for image processing. You should do in PHP something for which you need an online interface and/or something that is not a one-time task. Of course no harm in trying PHP's GD library (but I doubt it has any functions which don't have a faster alternative). –  Jan 11 '11 at 03:47
  • @mrNepal: When you say 9 exact copies - I assume that those are the copies you want to match. What I meant was that it will be difficult to match an image with its cropped version. If *all* copies are cropped from the original but are exact copies, that should not be a problem. I second Ivan's suggestion. –  Jan 11 '11 at 03:49
-1

You should check which of the 2 images is the smallest, take the size of that and then compare only the pixels within the rectangle size.

TJHeuvel
  • 12,403
  • 4
  • 37
  • 46