3

I'd like to get the contents of a directory that contains arbitrary files (a typical "Downloads" directory) and determine programmatically if a given file is an Image of any type.

I'm working in Clojure but anything available on the JVM is fair game.

Thanks in advance!

Tim Visher
  • 12,786
  • 16
  • 58
  • 66
  • 1
    Here is an useful link: http://webcache.googleusercontent.com/search?q=cache:2Gkd-GcGI5AJ:forums.sun.com/thread.jspa%3FthreadID%3D5398376+java+determine+if+a+file+is+an+image&cd=1&hl=sv&ct=clnk&gl=se&client=firefox-a – Johan Kotlinski Nov 18 '10 at 18:03

3 Answers3

4

You can use the Tika library that is able to detect many types of files, and also extract metadata from many of them. I have very simple Clojure wrapper for it

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Would you mind providing some sample code to the effect of `(filter is-image (file-seq "dir")`? I'm thinking that's how I want me function to look that operates on the directory. – Tim Visher Nov 19 '10 at 16:28
3

Obviously, the easiest thing to do is look at the filename extension. Of course, it's not necessarily reliable, but it may suffice in some circumstances.

Short of reading the whole image, you could read the first few bytes of the file to identify it by its "magic number". For example, JPEG files always start with the two bytes 0xFFD8 and end with 0xFFD9; PDFs always begin with the string "%PDF".

This saves you the overhead of creating an image in memory, and might speed up your I/O as well (since you only need a few bytes of the file).

If you don't want to research all these magic numbers yourself, you could try a library such as jMimeMagic. I've never used it, so I can't vouch for its quality or completeness, but it is LGPL. I'm sure you can find other alternatives as well.

eaj
  • 2,576
  • 1
  • 20
  • 42
  • 1
    tika library has more appropriate license ;-) – Alex Ott Nov 18 '10 at 20:21
  • I wouldn't presume to say what license is most appropriate for somebody else's project, but tika looks like a robust and well-developed library. Thanks for the suggestion. – eaj Nov 18 '10 at 20:30
  • this is very simple, just do (use 'tika) (detect-mime-type file), but instead of file you can use string, url or InputStream – Alex Ott Nov 20 '10 at 09:40
3

Ended up being able to solve this by combining the comment on your question along with my earlier answer from here. Minor change to the code let it work with images that are not images.

I didn't change it to recurse to sub directories. Would be easy enough to do.

(defn files-in-dir [dir]                                                                                                               
  (filter #(not (.isDirectory %))                                                                                                      
          (.listFiles (java.io.File. dir))))                                                                                           

(defn figure-out-height-width                                                                                                          
  [files]                                                                                                                              
  (remove nil?                                                                                                                         
          (map (fn [file]                                                                                                               
                 (with-open [r (java.io.FileInputStream. file)]                                                                        
                   (if-let [img (javax.imageio.ImageIO/read r)]                                                                        
                     [file (.getWidth img) (.getHeight img)])))                                                                        
               files)))                                                                                                                

user> (pprint (files-in-dir "/home/jmccrary/Downloads/"))                                                                              
(#<File /home/jmccrary/Downloads/Girl_Talk_-_All_Day_(IA123)_mp3s.zip>                                                                 
 #<File /home/jmccrary/Downloads/CSS3-for-Web-Designers.zip>                                                                           
 #<File /home/jmccrary/Downloads/manual.pdf>                                                                                           
 #<File /home/jmccrary/Downloads/test.jpeg>                                                                                            
 #<File /home/jmccrary/Downloads/nautilus-dropbox_0.6.7_amd64.deb>                                                                     
 #<File /home/jmccrary/Downloads/rubygems-1.3.7.tgz>                                                                                   
 #<File /home/jmccrary/Downloads/HTML5-FOR-WEB-DESIGNERS.zip>                                                                          
 #<File /home/jmccrary/Downloads/bcompare-3.1.11.12238.tar.gz>                                                                         
 #<File /home/jmccrary/Downloads/shared_ptr_example.cpp>)                                                                              
nil                                                                                                                                    
user> (figure-out-height-width (files-in-dir "/home/jmccrary/Downloads"))                                                              
([#<File /home/jmccrary/Downloads/test.jpeg> 32 32])

After thinking about it for a bit it feels dirty to combine the check for a file being an image with the pulling out the width and height. Alternatively you could define a function which does this filtering separately and gives you a seq of images.

(defn filter-images                                                                                                                    
  [files]                                                                                                                              
  (reduce (fn [res file]                                                                                                                
            (if-let [img (javax.imageio.ImageIO/read file)]                                                                            
              (conj res img)                                                                                                           
              res))                                                                                                                    
          []                                                                                                                           
          files))

user> (filter-images (files-in-dir "/home/jmccrary/Downloads"))                                                                        
[#<BufferedImage BufferedImage@24753433: type = 5 ColorModel: #pixelBits = 24 numComponents = 3 color space = java.awt.color.ICC_Color\
Space@43036651 transparency = 1 has alpha = false isAlphaPre = false ByteInterleavedRaster: width = 32 height = 32 #numDataElements 3 \
dataOff[0] = 2>

]

Community
  • 1
  • 1
Jake McCrary
  • 1,180
  • 9
  • 8