I tried:
- magic library
- PIL (Python Image Library)
- imagesize library
What I discovered:
- Using regex to parse the string output from magic is unreliable because the string output is different for all file types. If your system accepts several or many different image types, you'd have to make sure you're parsing the string output correctly. See my test's output below to for what the string output looks like for each file type.
- imagesize library is not very robust. Out of the box currently, it analyzes JPEG/JPEG 2000/PNG/GIF/TIFF/SVG/Netpbm/WebP. When it tries to find the image size of a file it's not equipped to handle, it gives you a size of
(-1, -1)
. You can see this in my .BMP example below.
My Conclusion - I would choose PIL
. It's not as fast as imagesize
library, but it is more robust to handle more file types. I also think it's "fast enough" for most use-cases. Using re
to parse the magic
output is not reliable, and it's much slower compared to PIL
.
My Test
I took a saved image on my hard drive (636 x 636) and saved it into 6 different file formats (.png, .jpg, .jpeg, .tif, .tiff, .bmp). The images and the script were all saved in the same directory. The file sizes of each file type in my test are commented out below next to the file names.
The script:
import os
import re
import timeit
import magic
import imagesize
import time
from PIL import Image
"""
Notes:
- all images are the same image saved as different formats
- file extensions tested are: .png, .jpg, .jpeg, .tif, .tiff, .bmp
- all images in this test are size 636 x 636
- all images are in the same directory as this script
If you want to setup this similar experiment, take a single image,
save it as: png_image.png, jpg_image.jpg, jpeg_image.jpeg, tif_image.tif,
tiff_image.tiff, and bmp_image.bmp (others if you'd like),
in the same directory as this script, and run this script.
Or name the images whatever and modify the script below. You do you.
"""
NUMBER = 10000
REPEAT = 5
def regex(filename):
name,ext = os.path.splitext(filename)
if ext.lower() in ['.tif', '.tiff']:
return '^(?=.*width=(\d+))(?=.*height=(\d+))'
elif ext.lower() in ['.jpg', '.jpeg', '.png']:
return '(\d+)\s?x\s?(\d+)'
elif ext.lower() in ['.bmp']:
return '(\d+)\s?x\s?(\d+)\s?x\s?\d+'
else:
raise Exception('Extension %s is not accounted for.' % ext.lower())
PNG_FILE = 'png_image.png' # 559 KB
JPG_FILE = 'jpg_image.jpg' # 94 KB
JPEG_FILE = 'jpeg_image.jpeg' # 94 KB
TIF_FILE = 'tif_image.tif' # 768 KB
TIFF_FILE = 'tiff_image.tiff' # 768 KB
BMP_FILE = 'bmp_image.bmp' # 1,581 KB
FILENAMES = [PNG_FILE, JPG_FILE, JPEG_FILE, TIF_FILE, TIFF_FILE, BMP_FILE]
now = time.time()
for filename in FILENAMES:
print('#' * 36)
print((" Testing %s" % filename).center(36, "#"))
print('#' * 36)
print('# ' + 'magic library'.center(32) + ' #')
print(' ', 'output:', magic.from_file(filename))
print(' ', "Size:", re.findall(regex(filename), magic.from_file(filename))[-1])
print(' ', "Regex used:", regex(filename))
print('# ' + 'PIL library'.center(32) + ' #')
image = Image.open(filename)
print(' ', image)
print(' ', "Size:", image.size)
print(' ', "Regex used:", 'None')
print('# ' + 'imagesize library'.center(32) + ' #')
image = imagesize.get(filename)
print(' ', "Size:", image)
print(' ', "Regex used:", 'None')
print('-' * 30 + '\n')
print("#################################end#######################################\n")
start = time.time()
for filename in FILENAMES:
print((" Testing %s " % filename).center(36, "#"))
# magic library
magic_timer = timeit.Timer(
stmt="width, height = re.findall(pattern, magic.from_file(filename))[-1]",
setup="import magic; import re; filename='" + filename + "'; pattern=r'" + regex(filename) + "';",
)
magic_timeit = magic_timer.timeit(number=NUMBER)
magic_repeat = magic_timer.repeat(repeat=REPEAT, number=NUMBER)
print('magic'.ljust(12) + ":", "%.15f," % magic_timeit, "%s repeat avg. : %.15f" % (REPEAT, sum(magic_repeat) / REPEAT))
# PIL library
pillow_timer = timeit.Timer(
stmt="width, height = Image.open(filename).size;",
setup="from PIL import Image; filename='" + filename + "';",
)
pillow_timeit = pillow_timer.timeit(number=NUMBER)
pillow_repeat = pillow_timer.repeat(repeat=REPEAT, number=NUMBER)
print('PIL'.ljust(12) + ":", "%.15f," % pillow_timeit, "%s repeat avg. : %.15f" % (REPEAT, sum(pillow_repeat) / REPEAT))
# imagesize library
imagesize_timer = timeit.Timer(
stmt="width, height = imagesize.get(filename);",
setup="import imagesize; filename='" + filename + "';",
)
imagesize_timeit = imagesize_timer.timeit(number=NUMBER)
imagesize_repeat = imagesize_timer.repeat(repeat=REPEAT, number=NUMBER)
print('imagesize'.ljust(12) + ":", "%.15f," % imagesize_timeit, "%s repeat avg. : %.15f" % (REPEAT, sum(imagesize_repeat) / REPEAT))
stop = time.time()
mins, secs = divmod(stop - start, 60)
print('\nTest time: %d minutes %d seconds' % (mins, secs))
print("\n#################################end#######################################\n")
The output:
####################################
####### Testing png_image.png#######
####################################
# magic library #
output: PNG image data, 636 x 636, 8-bit/color RGB, non-interlaced
Size: ('636', '636')
Regex used: (\d+)\s?x\s?(\d+)
# PIL library #
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=636x636 at 0x1EBDE962710>
Size: (636, 636)
Regex used: None
# imagesize library #
Size: (636, 636)
Regex used: None
------------------------------
####################################
####### Testing jpg_image.jpg#######
####################################
# magic library #
output: JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, baseline, precision 8, 636x636, frames 3
Size: ('636', '636')
Regex used: (\d+)\s?x\s?(\d+)
# PIL library #
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=636x636 at 0x1EBDF3E1810>
Size: (636, 636)
Regex used: None
# imagesize library #
Size: (636, 636)
Regex used: None
------------------------------
####################################
###### Testing jpeg_image.jpeg######
####################################
# magic library #
output: JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, baseline, precision 8, 636x636, frames 3
Size: ('636', '636')
Regex used: (\d+)\s?x\s?(\d+)
# PIL library #
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=636x636 at 0x1EBDF3E3010>
Size: (636, 636)
Regex used: None
# imagesize library #
Size: (636, 636)
Regex used: None
------------------------------
####################################
####### Testing tif_image.tif#######
####################################
# magic library #
output: TIFF image data, little-endian, direntries=16, height=636, bps=63732, compression=LZW, PhotometricIntepretation=RGB, width=636
Size: ('636', '636')
Regex used: ^(?=.*width=(\d+))(?=.*height=(\d+))
# PIL library #
<PIL.TiffImagePlugin.TiffImageFile image mode=RGBA size=636x636 at 0x1EBDF3E1810>
Size: (636, 636)
Regex used: None
# imagesize library #
Size: (636, 636)
Regex used: None
------------------------------
####################################
###### Testing tiff_image.tiff######
####################################
# magic library #
output: TIFF image data, little-endian, direntries=16, height=636, bps=63732, compression=LZW, PhotometricIntepretation=RGB, width=636
Size: ('636', '636')
Regex used: ^(?=.*width=(\d+))(?=.*height=(\d+))
# PIL library #
<PIL.TiffImagePlugin.TiffImageFile image mode=RGBA size=636x636 at 0x1EBDF3E3160>
Size: (636, 636)
Regex used: None
# imagesize library #
Size: (636, 636)
Regex used: None
------------------------------
####################################
####### Testing bmp_image.bmp#######
####################################
# magic library #
output: PC bitmap, Windows 3.x format, 636 x 636 x 32
Size: ('636', '636')
Regex used: (\d+)\s?x\s?(\d+)\s?x\s?\d+
# PIL library #
<PIL.BmpImagePlugin.BmpImageFile image mode=RGB size=636x636 at 0x1EBDF3E31F0>
Size: (636, 636)
Regex used: None
# imagesize library #
Size: (-1, -1)
Regex used: None
------------------------------
#################################end#######################################
The timing comparisons of each library / method.
I set timeit
to 10,000 times, and repeat of 5. For reference, it took 7 minutes 46 seconds to run.
###### Testing png_image.png #######
magic : 9.280310999951325 , 5 repeat avg. : 8.674063340038993
PIL : 1.069168900023215 , 5 repeat avg. : 1.100983139988966
imagesize : 0.676764299976639 , 5 repeat avg. : 0.658798480057158
###### Testing jpg_image.jpg #######
magic : 7.006248699966818 , 5 repeat avg. : 6.803474060003646
PIL : 1.295019199955277 , 5 repeat avg. : 1.230920840008184
imagesize : 0.709322200040333 , 5 repeat avg. : 0.706342480005696
##### Testing jpeg_image.jpeg ######
magic : 6.531979499966837 , 5 repeat avg. : 6.501230620010756
PIL : 1.263985900091939 , 5 repeat avg. : 1.263613799982704
imagesize : 0.666680400026962 , 5 repeat avg. : 0.701455319998786
###### Testing tif_image.tif #######
magic : 11.265482199960388, 5 repeat avg. : 11.423775779991411
PIL : 3.702962300041690 , 5 repeat avg. : 3.857250300026499
imagesize : 0.764358000014909 , 5 repeat avg. : 0.750753180007450
##### Testing tiff_image.tiff ######
magic : 11.288321400061250, 5 repeat avg. : 11.339019200019539
PIL : 4.116472600027919 , 5 repeat avg. : 3.834464759984985
imagesize : 0.753993199905381 , 5 repeat avg. : 0.758465819992125
###### Testing bmp_image.bmp #######
magic : 16.124460300081410, 5 repeat avg. : 16.291060140007176
PIL : 0.919579099980183 , 5 repeat avg. : 0.928753740014508
imagesize : 0.649574000039138 , 5 repeat avg. : 0.654250180022791
Test time: 7 minutes 46 seconds
#################################end#######################################
Note: I'm no timing expert, so if my timing approach seems invalid, please point it out.