3

I'm working in a project of images defects clustering. Each image is associated to a specific defect type ( And a 3d array of pixels using readJPEG ).

An example of images is the following : https://i.stack.imgur.com/pO9XY.jpg

library(jpeg)
im <- readJPEG("C:/Users/Rayane_2/Desktop/Data/PCB1/PCB/PCB_USED/01.jpg")
dim(im)
[1] 1586 3034    3

The desired process is described as follow :

For each picture in specific directory :

1/ Convert the JPG picture to a 3d array ** ( RGB data of jpg image is a 3d array ). 

2/ Summarize that 3d array in a **vector** of statistics using a function like `stats()` . 

3/ Return this vector and continue to build a full clustering dataset. 

I'm searching to convert im[,,1] , im[,,2] , im[,,3] as vectors as.vector().

After that i need to extract some statistics , something like :

stats <-function(im){

return(c(min(as.vector(im[,,1])),max(as.vector(im[,,1])),sum(as.vector(im[,,1])),range(as.vector(im[,,1])),var(as.vector(im[,,1])), min(as.vector(im[,,2])),max(as.vector(im[,,2])),sum(as.vector(im[,,2])),range(as.vector(im[,,2])),var(as.vector(im[,,2])),min(as.vector(im[,,3])),max(as.vector(im[,,3])),sum(as.vector(im[,,3])),range(as.vector(im[,,3])),var(as.vector(im[,,3])))

}

There are possible solutions to obtain current statistics using r packages such descr() in {summarytools} , see R statistics package

Because of im 3d-array high dimensions, the running is very slow

dim(im)
[1] 1586 3034    3

Question :

I'm searching possible solutions , any other R functions / packages that can do such task in a very fast way ?

Thanks ,

Tou Mou
  • 1,270
  • 5
  • 16
  • 2
    You can just loop over the third dimenstion i.e. `apply(im, 3, \(x) c(min =min(x), max = max(x), sum = sum(x)))` – akrun Jul 30 '22 at 19:09
  • @akrun , nice to see you again. This gives the error > Error: unexpected input in "apply(im, 3, \" – Tou Mou Jul 30 '22 at 19:12
  • 2
    You may have an older R version, change the `\(x)` to `function(x)` – akrun Jul 30 '22 at 19:12
  • 1
    Do you want the summary statistics for all the 3034 columns separately? or just summary on the matrix 1586 x 3034 for each of the third dim – akrun Jul 30 '22 at 19:14
  • Exactly , i'm serching just summary on the matrix 1586 x 3034 for each of the third dim. Each of third level matrices represent one of following colors > Red , green , blue. – Tou Mou Jul 30 '22 at 19:17
  • If you are reading all the files in the folder, then use `lapply` with `apply` i.e. `lst1 <- lapply(jpgfiles, function(file) apply(readJPEG(file), 3, function(x) c(min = min(x), max = max(x), sum = sum(x))))` – akrun Jul 30 '22 at 19:17

1 Answers1

3

We could loop over the the third dimension with apply and MARGIN = 3

out <- apply(im, 3, function(x) c(min =min(x), max = max(x), sum = sum(x)))

If there are multiple files, read them into a list first

lst1 <- lapply(jpgfiles, function(file) apply(readJPEG(file), 3, 
      function(x) c(min = min(x), max = max(x), sum = sum(x))))
akrun
  • 874,273
  • 37
  • 540
  • 662