4

I need to automate some image transformations to do the following: - read in 16,000+ images that are short and wide, sizing is not the same. - rescale each image to 90 pixels high - crop 90 pixels over the width of the image, so multiple 90x90 crops over 1 image - then do it all over again for the next image - each 90x90 image needs to be saved as file-name_1.png, file-name_2.png and so on in sequential order

I've completed a test on 8 images, and using the magick package I was able to rescale and create multiple crops from each image manually. The problem is when I try to do multiple, I am able to resize the images easily but when it comes to saving them there is a problem.

# capture images, file paths in a list
img_list <- list.files("./orig_images", pattern = "\\.png$", full.names = TRUE)

# get all images in a list
all_images <- lapply(img_list, image_read)

# scale each image height - THIS DOESN'T WORK, GET NULL VALUE
scale_images <- 
  for (i in 1:length(all_images)) {
  scale_images(all_images[[i]], "x90")
    }

# all images added into one
all_images_joined <- image_join(all_images)

# scale images - THIS WORKS to scale, but problems later
all_images_scaled <- 
  image_scale(all_images_joined, "x90")

# Test whether a single file will be written or multiple files; 
# only writes one file (even if I 
for (i in 1:length(all_images_scaled)) {
  image_write(all_images_scaled[[i]], path = "filepath/new_cropimages/filename")
}

Ideally, I would scale the images with a for loop. That way I can save the scaled images to a directory. This didn't work - I don't get an error, but when I check the contents of the variable it is null. The image_join function puts them all together and scales the height to 90 (width is also scaled proportionately) but I can't write the separate images to directory. Also, the next piece is to crop each image across the width and save the new images file-name_1.png, and so on for every image 90x90, move over 90 pixels, crop 90x90, and so on. I chose magic because it was easy to individually scale and crop, but I'm open to other ideas (or learning how to make that package work). Thanks for any help.

Here are some images:

[Original Image, untransformed][1]
[Manual 90x90 crop][2]
[Another manual 90x90 crop, farther down the same image][3]


  [1]: https://i.stack.imgur.com/8ptXv.png
  [2]: https://i.stack.imgur.com/SF9pG.png
  [3]: https://i.stack.imgur.com/NyKxS.png
techster
  • 43
  • 6
  • I think a diagram would help enormously... showing a couple of different size input images and what output images would result. Also, what OS are you using? – Mark Setchell Jun 05 '19 at 06:15
  • Thank you for your suggestions! I have added some images that will hopefully help. – techster Jun 05 '19 at 21:16
  • So what happens when you resize the image to 90 px high and it comes out at 110 px wide? You'll get one image 90x90 and one image 20x90 ? What OS are you using? – Mark Setchell Jun 05 '19 at 21:39
  • So of the 16000 sentences (images), the smallest I came across was 90 x 680 after scaling height to 90px. The last image that is less than 90x90, I may just discard because I'll have so many other images that it could even out. Or I might try some way to find those images that aren't 90px wide and do some kind of manipulation to make them square. I'm not sure how to handle them. I'm workin in R (and Rstudio) but it's installed on a mac if that helps. – techster Jun 05 '19 at 22:57
  • Yes, both answers worked like a charm! I'll check it, I'm very new to stack overflow so thank you for letting me know what to do next, sorry it took me so long to see it. – techster Jun 14 '19 at 22:32

2 Answers2

2

I don't speak R, but I hope to be able to help with the ImageMagick aspects and getting 16,000 images processed.

As you are on a Mac, you can install 2 very useful packages very easily with homebrew, using:

brew install imagemagick
brew install parallel

So, your original sentence image is 1850x105 pixels, you can see that in Terminal like this:

magick identify sentence.png
sentence.png PNG 1850x105 1850x105+0+0 8-bit Gray 256c 51626B 0.000u 0:00.000

If you resize the height to 90px, leaving the width to follow proportionally, it will become 1586x90px:

magick sentence.png -resize x90 info:
sentence.png PNG 1586x90 1586x90+0+0 8-bit Gray 51626B 0.060u 0:00.006

So, if you resize and then crop into 90px wide chunks:

magick sentence.png -resize x90 -crop 90x chunk-%03d.png

you will get 18 chunks, each 90 px wide except the last, as follows:

-rw-r--r--  1 mark  staff  5648  6 Jun 08:07 chunk-000.png
-rw-r--r--  1 mark  staff  5319  6 Jun 08:07 chunk-001.png
-rw-r--r--  1 mark  staff  5870  6 Jun 08:07 chunk-002.png
-rw-r--r--  1 mark  staff  6164  6 Jun 08:07 chunk-003.png
-rw-r--r--  1 mark  staff  5001  6 Jun 08:07 chunk-004.png
-rw-r--r--  1 mark  staff  6420  6 Jun 08:07 chunk-005.png
-rw-r--r--  1 mark  staff  4726  6 Jun 08:07 chunk-006.png
-rw-r--r--  1 mark  staff  5559  6 Jun 08:07 chunk-007.png
-rw-r--r--  1 mark  staff  5053  6 Jun 08:07 chunk-008.png
-rw-r--r--  1 mark  staff  4413  6 Jun 08:07 chunk-009.png
-rw-r--r--  1 mark  staff  5960  6 Jun 08:07 chunk-010.png
-rw-r--r--  1 mark  staff  5392  6 Jun 08:07 chunk-011.png
-rw-r--r--  1 mark  staff  4280  6 Jun 08:07 chunk-012.png
-rw-r--r--  1 mark  staff  5681  6 Jun 08:07 chunk-013.png
-rw-r--r--  1 mark  staff  5395  6 Jun 08:07 chunk-014.png
-rw-r--r--  1 mark  staff  5065  6 Jun 08:07 chunk-015.png
-rw-r--r--  1 mark  staff  6322  6 Jun 08:07 chunk-016.png
-rw-r--r--  1 mark  staff  4848  6 Jun 08:07 chunk-017.png

Now, if you have 16,000 sentences to process, you can use GNU Parallel to get them all done in parallel and also get sensible names for all the files. Let's do a dry-run first so it actually doesn't do anything, but just shows you what it would do:

parallel --dry-run magick {} -resize x90 -crop 90x {.}-%03d.png ::: sentence*

Sample Output

magick sentence1.png -resize x90 -crop 90x sentence1-%03d.png 
magick sentence2.png -resize x90 -crop 90x sentence2-%03d.png
magick sentence3.png -resize x90 -crop 90x sentence3-%03d.png

That looks good, so remove the --dry-run and do it again and you get the following output for the three (identical copies) of your sentence I made:

-rw-r--r--  1 mark  staff  5648  6 Jun 08:13 sentence1-000.png
-rw-r--r--  1 mark  staff  5319  6 Jun 08:13 sentence1-001.png
-rw-r--r--  1 mark  staff  5870  6 Jun 08:13 sentence1-002.png
-rw-r--r--  1 mark  staff  6164  6 Jun 08:13 sentence1-003.png
-rw-r--r--  1 mark  staff  5001  6 Jun 08:13 sentence1-004.png
-rw-r--r--  1 mark  staff  6420  6 Jun 08:13 sentence1-005.png
-rw-r--r--  1 mark  staff  4726  6 Jun 08:13 sentence1-006.png
-rw-r--r--  1 mark  staff  5559  6 Jun 08:13 sentence1-007.png
-rw-r--r--  1 mark  staff  5053  6 Jun 08:13 sentence1-008.png
-rw-r--r--  1 mark  staff  4413  6 Jun 08:13 sentence1-009.png
-rw-r--r--  1 mark  staff  5960  6 Jun 08:13 sentence1-010.png
-rw-r--r--  1 mark  staff  5392  6 Jun 08:13 sentence1-011.png
-rw-r--r--  1 mark  staff  4280  6 Jun 08:13 sentence1-012.png
-rw-r--r--  1 mark  staff  5681  6 Jun 08:13 sentence1-013.png
-rw-r--r--  1 mark  staff  5395  6 Jun 08:13 sentence1-014.png
-rw-r--r--  1 mark  staff  5065  6 Jun 08:13 sentence1-015.png
-rw-r--r--  1 mark  staff  6322  6 Jun 08:13 sentence1-016.png
-rw-r--r--  1 mark  staff  4848  6 Jun 08:13 sentence1-017.png
-rw-r--r--  1 mark  staff  5648  6 Jun 08:13 sentence2-000.png
-rw-r--r--  1 mark  staff  5319  6 Jun 08:13 sentence2-001.png
-rw-r--r--  1 mark  staff  5870  6 Jun 08:13 sentence2-002.png
-rw-r--r--  1 mark  staff  6164  6 Jun 08:13 sentence2-003.png
-rw-r--r--  1 mark  staff  5001  6 Jun 08:13 sentence2-004.png
-rw-r--r--  1 mark  staff  6420  6 Jun 08:13 sentence2-005.png
-rw-r--r--  1 mark  staff  4726  6 Jun 08:13 sentence2-006.png
-rw-r--r--  1 mark  staff  5559  6 Jun 08:13 sentence2-007.png
-rw-r--r--  1 mark  staff  5053  6 Jun 08:13 sentence2-008.png
-rw-r--r--  1 mark  staff  4413  6 Jun 08:13 sentence2-009.png
-rw-r--r--  1 mark  staff  5960  6 Jun 08:13 sentence2-010.png
-rw-r--r--  1 mark  staff  5392  6 Jun 08:13 sentence2-011.png
-rw-r--r--  1 mark  staff  4280  6 Jun 08:13 sentence2-012.png
-rw-r--r--  1 mark  staff  5681  6 Jun 08:13 sentence2-013.png
-rw-r--r--  1 mark  staff  5395  6 Jun 08:13 sentence2-014.png
-rw-r--r--  1 mark  staff  5065  6 Jun 08:13 sentence2-015.png
-rw-r--r--  1 mark  staff  6322  6 Jun 08:13 sentence2-016.png
-rw-r--r--  1 mark  staff  4848  6 Jun 08:13 sentence2-017.png
-rw-r--r--  1 mark  staff  5648  6 Jun 08:13 sentence3-000.png
-rw-r--r--  1 mark  staff  5319  6 Jun 08:13 sentence3-001.png
-rw-r--r--  1 mark  staff  5870  6 Jun 08:13 sentence3-002.png
-rw-r--r--  1 mark  staff  6164  6 Jun 08:13 sentence3-003.png
-rw-r--r--  1 mark  staff  5001  6 Jun 08:13 sentence3-004.png
-rw-r--r--  1 mark  staff  6420  6 Jun 08:13 sentence3-005.png
-rw-r--r--  1 mark  staff  4726  6 Jun 08:13 sentence3-006.png
-rw-r--r--  1 mark  staff  5559  6 Jun 08:13 sentence3-007.png
-rw-r--r--  1 mark  staff  5053  6 Jun 08:13 sentence3-008.png
-rw-r--r--  1 mark  staff  4413  6 Jun 08:13 sentence3-009.png
-rw-r--r--  1 mark  staff  5960  6 Jun 08:13 sentence3-010.png
-rw-r--r--  1 mark  staff  5392  6 Jun 08:13 sentence3-011.png
-rw-r--r--  1 mark  staff  4280  6 Jun 08:13 sentence3-012.png
-rw-r--r--  1 mark  staff  5681  6 Jun 08:13 sentence3-013.png
-rw-r--r--  1 mark  staff  5395  6 Jun 08:13 sentence3-014.png
-rw-r--r--  1 mark  staff  5065  6 Jun 08:13 sentence3-015.png
-rw-r--r--  1 mark  staff  6322  6 Jun 08:13 sentence3-016.png
-rw-r--r--  1 mark  staff  4848  6 Jun 08:13 sentence3-017.png

A word of explanation about the parameters to parallel:

  • {} refers to "the current file"
  • {.} refers to "the current file without its extension"
  • ::: separates the parameters meant for parallel from those meant for your magick command

One note of warning, PNG images can "remember" where they came from which can be useful, or very annoying. If you look at the last chunk from above you will see it is 56x90, but that following that, it "remembers" it came from a canvas 1586x90 at offset 1530,0:

identify sentence3-017.png 
sentence3-017.png PNG 56x90 1586x90+1530+0 8-bit Gray 256c 4848B 0.000u 0:00.000

This can sometimes upset subsequent processing which is annoying, or sometimes be very useful in re-assembling images that have been chopped up! If you want to remove it, you need to repage, so the command above becomes:

magick input.png -resize x90 -crop 90x +repage output.png 
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Wow, thank you for this amazing and comprehensive answer! I really appreciate it, I will test it first thing tonight - I've been trying so hard to automate but was only finding pieces of the answer. – techster Jun 06 '19 at 10:43
  • Mark's answer is terrific and I would be inclined to solve this problem with ImageMagick. But since you asked for an R solution, here's one as well! – David O Jun 06 '19 at 15:34
2

Updated - to make better use of the tools in EBImage

ImageMagick is a great approach. But should you want to perform some content analysis on the images, here is a solution with R. R does provide some pretty handy tools. Also, images are "nothing" but matrices, which R handles really well. By reducing the images to matrices, the package EBImage does this very well and, for better or for worse, removes some of the metadata with each image. Here's a R solution with EBImage. Again though, Mark's solution may be better for really big production runs.

The solution is structured around a large "for" loop. It would be prudent to add error checking at several steps. The code takes advantage of EBImage to manage both color and grayscale images.

Here, the final image is centered in an extended image by adding pixels of the desired background color. The extended image is then cropped into tiles. The logic determining the value for pad can be adjusted to simply crop the image or left justify or right justify it, if desired.

It starts by assuming you begin in the working directory with the source files in ./source and the destination to be in ./dest. It also creates a new directory for each "tiled" image. That could be changed to have a single directory receive all the images as well as other protective coding. Here, the images are assumed to be PNG files with an appropriate extension. The desired tile size (90) to be applied to both height and width is stored in the variable size.

# EBImage needs to be available
  if (!require(EBImage)) {
    source("https://bioconductor.org/biocLite.R")
    biocLite("EBImage")
    library(EBImage)
  }

# From the working directory, select image files
  size <- 90
  bg.col <- "transparent" # or any other color specification for R
  ff <- list.files("source", full = TRUE,
    pattern = "png$", ignore.case = TRUE)

# Walk through all files with a 'for' loop, 
  for (f in ff) {
    # Extract base name, even names like "foo.bar.1.png" 
      txt <- unlist(strsplit(basename(f), ".", fixed = TRUE))
      len <- length(txt)
      base <- ifelse(len == 1, txt[1], paste(txt[-len], collapse = "."))

    # Read one image and resize
      img <- readImage(f)
      img <- resize(img, h = size) # options allow for antialiasing

    # Determine number tiles and padding needed
      nx <- ceiling(dim(img)[1]/size)
      newdm <- c(nx * size, size) # extend final image
      pad <- newdm[1] - dim(img)[1] # pixels needed to extend 

    # Translate the image with given background fille
      img <- translate(img, c(pad%/%2, 0), output.dim = newdm, bg.col = bg.col)

    # Split image into appropriate sized tiles with 'untile'
      img <- untile(img, c(nx, 1), lwd = 0) # see the help file

    # Create a new directory for each image
      dpath <- file.path("dest", trimws(base)) # Windows doesn't like " "
      if (!dir.create(dpath))
        stop("unable to create directory: ", dpath)
      
    # Create new image file names for each frame
      fn <- sprintf("%s_%03d.png", base, seq_len(nx))
      fpaths <- file.path(dpath, fn)

    # Save individual tiles (as PNG) and names of saved files
      saved <- mapply(writeImage, x = getFrames(img, type = "render"), 
        files = fpaths)

    # Check on the results from 'mapply'
      print(saved)
  }
Community
  • 1
  • 1
David O
  • 803
  • 4
  • 10
  • Thank you, I really appreciate it! I'm definitely happy to have 2 different, well described ways to do it. It was driving me crazy that I couldn't figure out how to do it with the R magick package. – techster Jun 07 '19 at 15:22
  • You're welcome! I do use ImageMagick at times from the command line especially like it for animations. I was happy to see the port of it to R (`magick`) by Jeroen Ooms but found it easier to stick with the command line. I've replaced my kludgy code with better use of the native tools in `EBImage`. I look forward to seeing if it works for you! – David O Jun 07 '19 at 17:23