How can I determine if a file is an image on the JVM?

Question

I'd like to get the contents of a directory that contains arbitrary files (a typical "Downloads" directory) and determine programmatically if a given file is an Image of any type.

I'm working in Clojure but anything available on the JVM is fair game.

Thanks in advance!

Here is an useful link: http://webcache.googleusercontent.com/search?q=cache:2Gkd-GcGI5AJ:forums.sun.com/thread.jspa%3FthreadID%3D5398376+java+determine+if+a+file+is+an+image&cd=1&hl=sv&ct=clnk&gl=se&client=firefox-a — Johan Kotlinski, Nov 18 '10 at 18:03

score 4 · Answer 1 · answered Nov 18 '10 at 20:20

4

You can use the Tika library that is able to detect many types of files, and also extract metadata from many of them. I have very simple Clojure wrapper for it

answered Nov 18 '10 at 20:20

Alex Ott

80,552
8
87
132

Would you mind providing some sample code to the effect of `(filter is-image (file-seq "dir")`? I'm thinking that's how I want me function to look that operates on the directory. – Tim Visher Nov 19 '10 at 16:28

score 3 · Answer 2 · answered Nov 18 '10 at 18:15

3

Obviously, the easiest thing to do is look at the filename extension. Of course, it's not necessarily reliable, but it may suffice in some circumstances.

Short of reading the whole image, you could read the first few bytes of the file to identify it by its "magic number". For example, JPEG files always start with the two bytes 0xFFD8 and end with 0xFFD9; PDFs always begin with the string "%PDF".

This saves you the overhead of creating an image in memory, and might speed up your I/O as well (since you only need a few bytes of the file).

If you don't want to research all these magic numbers yourself, you could try a library such as jMimeMagic. I've never used it, so I can't vouch for its quality or completeness, but it is LGPL. I'm sure you can find other alternatives as well.

answered Nov 18 '10 at 18:15

eaj

2,576
1
20
42

1

tika library has more appropriate license ;-) – Alex Ott Nov 18 '10 at 20:21
I wouldn't presume to say what license is most appropriate for somebody else's project, but tika looks like a robust and well-developed library. Thanks for the suggestion. – eaj Nov 18 '10 at 20:30
this is very simple, just do (use 'tika) (detect-mime-type file), but instead of file you can use string, url or InputStream – Alex Ott Nov 20 '10 at 09:40

score 3 · Accepted Answer · edited May 23 '17 at 12:04

Ended up being able to solve this by combining the comment on your question along with my earlier answer from here. Minor change to the code let it work with images that are not images.

I didn't change it to recurse to sub directories. Would be easy enough to do.

(defn files-in-dir [dir]                                                                                                               
  (filter #(not (.isDirectory %))                                                                                                      
          (.listFiles (java.io.File. dir))))                                                                                           

(defn figure-out-height-width                                                                                                          
  [files]                                                                                                                              
  (remove nil?                                                                                                                         
          (map (fn [file]                                                                                                               
                 (with-open [r (java.io.FileInputStream. file)]                                                                        
                   (if-let [img (javax.imageio.ImageIO/read r)]                                                                        
                     [file (.getWidth img) (.getHeight img)])))                                                                        
               files)))                                                                                                                

user> (pprint (files-in-dir "/home/jmccrary/Downloads/"))                                                                              
(#<File /home/jmccrary/Downloads/Girl_Talk_-_All_Day_(IA123)_mp3s.zip>                                                                 
 #<File /home/jmccrary/Downloads/CSS3-for-Web-Designers.zip>                                                                           
 #<File /home/jmccrary/Downloads/manual.pdf>                                                                                           
 #<File /home/jmccrary/Downloads/test.jpeg>                                                                                            
 #<File /home/jmccrary/Downloads/nautilus-dropbox_0.6.7_amd64.deb>                                                                     
 #<File /home/jmccrary/Downloads/rubygems-1.3.7.tgz>                                                                                   
 #<File /home/jmccrary/Downloads/HTML5-FOR-WEB-DESIGNERS.zip>                                                                          
 #<File /home/jmccrary/Downloads/bcompare-3.1.11.12238.tar.gz>                                                                         
 #<File /home/jmccrary/Downloads/shared_ptr_example.cpp>)                                                                              
nil                                                                                                                                    
user> (figure-out-height-width (files-in-dir "/home/jmccrary/Downloads"))                                                              
([#<File /home/jmccrary/Downloads/test.jpeg> 32 32])

After thinking about it for a bit it feels dirty to combine the check for a file being an image with the pulling out the width and height. Alternatively you could define a function which does this filtering separately and gives you a seq of images.

(defn filter-images                                                                                                                    
  [files]                                                                                                                              
  (reduce (fn [res file]                                                                                                                
            (if-let [img (javax.imageio.ImageIO/read file)]                                                                            
              (conj res img)                                                                                                           
              res))                                                                                                                    
          []                                                                                                                           
          files))

user> (filter-images (files-in-dir "/home/jmccrary/Downloads"))                                                                        
[#<BufferedImage BufferedImage@24753433: type = 5 ColorModel: #pixelBits = 24 numComponents = 3 color space = java.awt.color.ICC_Color\
Space@43036651 transparency = 1 has alpha = false isAlphaPre = false ByteInterleavedRaster: width = 32 height = 32 #numDataElements 3 \
dataOff[0] = 2>

]

Great answer. As soon as I get around to playing with the code, I'll probably accept. Thanks so much. — Tim Visher, Nov 22 '10 at 16:19

How can I determine if a file is an image on the JVM?

3 Answers3