I have a python script that uses Pillow to open all png/jpg/jpeg files in a folder, copy some of their meta data (file name, size, width, height, pixels, size, etc) into a new object and push that object into a list called imageMetaData. I then traverse that list to compare every image to every other image to try and delete duplicate images (I have amassed a TON of duplicates to the point where of my 6000 images at least 1500 may be duplicates)
With a small size of images (~1500 is the biggest i have done successfully) it works fine! but when trying to run on my folder that has 6100 files it does not successfully create the imageMetaData list successfully and actually prints:
zsh: killed python3 remove-duplicates.py
I have looked into this and it seems to be running out of ram. But it seems like my RAM should be enough to hold a list of ~6000 objects where each object has about 8 fields.
My function is below:
from PIL import Image
from os import listdir
mypath = 'my-path-to-folder/remove-dupes/'
initialLocation = 'my-folder-of-photos'
directoryList = listdir(mypath + initialLocation)
def loadObjects():
myObjects = []
if len(directoryList) > 1:
for x in range(len(directoryList)):
if ('jp' in directoryList[x].lower() or 'png' in directoryList[x].lower()):
i = Image.open(mypath + initialLocation + '/' + directoryList[x])
width, height = i.size
pixels = i.load()
i.close()
myObjects.append({
'name': directoryList[x],
'width': width,
'height': height,
'pixels': pixels,
'size': os.stat(mypath + initialLocation + '/' + directoryList[x]).st_size,
'biggest': directoryList[x],
'index': x
})
return myObjects
As can be seen, the image is opened, loaded, and closed (correctly?) so i dont think I am leaving anything hanging. Any ideas as to why this is being killed or possibly how to get more details into why it was killed?