1

I have a few million images stored as jpgs. I'd like to reduce the size of each jpg by 80%. Here's a bash loop that I'm currently using (I'm on MacOS):

for i in *jpg; do convert "$i" -quality 80% "${i%.jpg}.jpg"; done; 

The above line converts the images sequentially. Is there a way to parallelize and thus speed up this conversion? I don't need to use bash, just want to find the fastest way to make the conversion.

mmz
  • 1,011
  • 1
  • 8
  • 21
  • Look at [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) . The idea is to share the work between multi process. Each one get a list of file names and work on them on parallel. You will be able to call `convert` from python code – balderman Nov 13 '21 at 14:34
  • @balderman is there a python package that offers functionality similar to `convert`? – mmz Nov 13 '21 at 14:35
  • Take a look at **PIL** –  Nov 13 '21 at 14:47
  • 1
    I think I would try GNU parallel for a simple shell script. – Lord Bo Nov 13 '21 at 14:57
  • @mmz I think the most recent python library wrapper for imagemagick is wand (https://docs.wand-py.org/). – Lord Bo Nov 13 '21 at 15:02
  • The premise is fundamentally **incorrect** that specifying `-quality 80%` will reduce the file size by 80%. It could either increase or decrease the file size depending on what the quality was beforehand. – Mark Setchell Nov 17 '21 at 22:22
  • The idea of starting one **ImageMagick** process for each JPEG file may be completely flawed depending on your disk subsystem and the initial size of your images. You might be significantly better off using **GNU Parallel** with its `-X` option and invoking `magick mogrify`. – Mark Setchell Nov 17 '21 at 22:28
  • 1
    Have a read here... https://stackoverflow.com/a/51822265/2836621 – Mark Setchell Nov 17 '21 at 22:31

3 Answers3

1

Using Python you can do it this way:

import glob
import shlex
import subprocess
from tqdm.contrib.concurrent import thread_map

def reduce_file(filepath):
    output = f"{filepath}_reduced.jpg"
    cmd = f"convert {filepath} -quality 80% {output}"
    subprocess.run(shlex.split(cmd))

list(thread_map(reduce_file, glob.glob("./images/*.jpg")))

Given that your images are in images/*.jpg.

Omar Aflak
  • 2,918
  • 21
  • 39
0

Parallelise the execution of convert with GNU xargs. This will run 10 convert processes simultaneously and restart more processes if less than 10 are running simultaneously until 10 are running simultaneously again.

printf "%s\n" *.jpg | xargs -P 10 -I {} convert {} -quality 80% {}

xargs replaces all {} in convert command with the file name that comes from stdin.

I assume that your file names do not contain a line break. The original files are overwritten.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

Using GNU Parallel it looks like this:

parallel convert {} -quality 80% {.}_80.jpg ::: *jpg 

If the million jpgs are in the same dir, the above line will be too long. Then try:

printf '%s\0' *.jpg | parallel -0 convert {} -quality 80% {.}_80.jpg
Ole Tange
  • 31,768
  • 5
  • 86
  • 104