4

I'm trying to batch resize (i.e., reduce the file size) of thousands of images using R. I've managed to achieve this using the code below, but it takes ages (especially when resizing >50,000) images. Is there any way that this task could be run on multiple cores? I'm a complete novice on parallel computing, so any assistance would be greatly appreciated. Thanks in advance!

library(imager)    

pages <- list.files(path = '...insert directory path...',
                full.names = TRUE)


for(x in 1:length(pages)) {

file <- load.image(pages[x])

resized <- imresize(file,
                  scale = 0.390625)

save.image(resized,
         file = gsub("JPG", "jpg", paste(pages[x])))

}
Ross
  • 359
  • 2
  • 11
  • Gave a look at e.g. the foreach package: ftp://cran.r-project.org/pub/R/web/packages/foreach/vignettes/foreach.pdf – Rentrop Oct 14 '16 at 05:35
  • You could use [snow and snowfall packages](https://www.r-bloggers.com/parallel-computing-in-r-snowfallsnow/) for parallel processing. – Blacksad Oct 14 '16 at 07:58
  • Have you considered using a command line tool? This might be faster. E.g. https://playingwithsid.blogspot.ch/2010/08/how-to-resize-photos-with-bash-shell.html – tobiasegli_te Oct 14 '16 at 09:17
  • What OS are you using? – Mark Setchell Oct 14 '16 at 09:49
  • Thanks for everyone's suggestions. I use a mac (running OS X El Capitan). I have tried using imagemagick to do this task (using mogrify), but I found it was even slower. I haven't yet tried the above suggestion by @tobiasegli_te though. It would be great to keep everything within R. – Ross Oct 14 '16 at 11:19
  • In case it is faster, you can run commandline functions using `system()` to keep your analysis in R – tobiasegli_te Oct 14 '16 at 11:21
  • 1
    Thanks @tobiasegli_te. I managed to get it to work, and it's a lot faster than my original code, but Mark's suggestion using GNU parallel was even faster. Thanks again for your help! – Ross Oct 14 '16 at 12:03

1 Answers1

3

The secret weapon is GNU Parallel. Install it with homebrew:

brew install parallel

Now, make an output directory so your input images don't get clobbered and run a bunch of mogrify commands in parallel:

mkdir results
parallel -X mogrify -resize 39% -path results ::: *.jpg

Please make a backup till you get the hang of this !!!

Benchmark

I made 1,000 JPEGs of 400x400 full of random noise and converted them sequentially with

time for i in *.jpg; do convert $i -resize 39% results/$i; done

real    0m19.086s
user    0m18.615s
sys     0m3.445s

And then in parallel:

time parallel -X mogrify -resize 39% -path results ::: *.jpg

real    0m3.351s
user    0m23.021s
sys     0m0.706s
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Excellent - I'm glad it worked out for you! Turn on `Activity Monitor` and press `CMD-2` and `CMD-3` while you are running it to remember why you paid the extra for that Intel corei7 :-) – Mark Setchell Oct 14 '16 at 12:35
  • Will do! I've got another potential challenge, which aims to batch blur images in parallel (http://stackoverflow.com/questions/40044958/batch-blur-images-using-multiple-cores). Appreciate any help if possible. Thanks again! – Ross Oct 14 '16 at 13:56
  • Hi @Mark Setchell, I'm getting the same error message as the other example (-bash: /usr/local/bin/parallel: Argument list too long). My attempts to modify the code in the same way as the other example isn't working. – Ross Oct 15 '16 at 11:48
  • It would be easiest to do the modification *"in-place"* if you can - i.e. by modifying the originals, so do the following ON A COPY. `cd /path/to/images; find . -name \*.jpg -print0 | parallel -0 -X mogrify -resize 39% {}` – Mark Setchell Oct 15 '16 at 11:51