1

I have many images in a folder and I am want to delete images of the same size. My code below, using PIL, works but I want to know if there is a more efficient way to achieve this.

import os
from PIL import Image

def checkImages(dirs):
  image_list = os.listdir(dirs)
  for i in range(len(image_list)):
      for j in range(i + 1, len(image_list)):
        im_a = Image.open(dirs + '/' + image_list[i])
        im_b = Image.open(dirs + '/' + image_list[j])
        if im_a.size == im_b.size:
          os.remove(dirs + '/' + image_list[j])
          del image_list[j]

checkImages('/content/gdrive/MyDrive/testdata')
smci
  • 32,567
  • 20
  • 113
  • 146
Nemo17
  • 27
  • 3

2 Answers2

1

You can keep a dictionary of sizes and delete any images that have a size that have already been seen. That way you don't need a nested loop, and don't have to create Image objects for the same file multiple times.

def checkImages(dirs):
  sizes = {}
  for file in os.listdir(dirs):
      size = Image.open(dirs + '/' + file).size
      if size in sizes:
         os.remove(dirs + '/' + file)
      else:
         sizes[size] = file
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • 1
    This can also be combined with the _imagesize_ library suggested by the other answer, which might help speed up the operation further. – Håken Lid Feb 23 '22 at 09:30
1

Please see this one Get Image size WITHOUT loading image into memory

you can use the library imagesize

import os
import imagesize

def checkImages(dirs):
  image_list = os.listdir(dirs)
  for i in range(len(image_list)):
      for j in range(i + 1, len(image_list)):
        im_a=imagesize.get(dirs+'/'+image_list[i])
        im_b=imagesize.get(dirs+'/'+image_list[j])
        if im_a == im_b:
          os.remove(dirs + '/' + image_list[j])
          del image_list[j]

checkImages('/content/gdrive/MyDrive/testdata')
ibadia
  • 909
  • 6
  • 15
  • 1
    I believe your approach is using a different library instead of `PIL`. Not sure if it makes so much difference from my approach. – Nemo17 Feb 23 '22 at 09:24
  • According to the benchmark on github, there could be a noticeable speed improvent over PIL (Pillow). https://github.com/shibukawa/imagesize_py#benchmark – Håken Lid Feb 23 '22 at 09:31
  • Its just a different library which is faster than PIL and it does not load image in memory. Combine the imagesize library approach with the other answer and it will be most efficient. – ibadia Feb 23 '22 at 09:37