9
  • This answer How can I determine if a file is a PDF file? recommends to download another library, but my requirement is that I just need to check if a file is directory is of type PDF or not

  • Using complete library for this use looks like overkill

  • Are there any ways to know that a Java File is of type PDF?
hippietrail
  • 15,848
  • 18
  • 99
  • 158
daydreamer
  • 87,243
  • 191
  • 450
  • 722
  • 3
    Why don't you want to use a library? What is the use case of this? Looking at the extension is usually not a good idea, because anyone and any other program can change an extension. Without looking at the file it will be hard to determine if it really is a PDF or not. And for this I recommend you using a library. – peshkira Nov 08 '12 at 20:14
  • Related/duplicate: http://stackoverflow.com/questions/1915317/howto-extract-mimetype-from-a-byte – Tomasz Nurkiewicz Nov 08 '12 at 20:14
  • Try having a look at http://stackoverflow.com/questions/51438/getting-a-files-mime-type-in-java – MadProgrammer Nov 08 '12 at 20:18

8 Answers8

19

Well, according to wikipedia PDF files start with magic numbers: "%PDF" (hex 25 50 44 46) so maybe you should check the InputStream from the file and check that.

ElderMael
  • 7,000
  • 5
  • 34
  • 53
6

SimpleMagic is a Java library for resolving content types:

<!-- pom.xml -->
    <dependency>
        <groupId>com.j256.simplemagic</groupId>
        <artifactId>simplemagic</artifactId>
        <version>1.8</version>
    </dependency>

import com.j256.simplemagic.ContentInfo;
import com.j256.simplemagic.ContentInfoUtil;
import com.j256.simplemagic.ContentType;
// ...

public class SimpleMagicSmokeTest {

    private final static Logger log = LoggerFactory.getLogger(SimpleMagicSmokeTest.class);

    @Test
    public void smokeTestSimpleMagic() throws IOException {
        ContentInfoUtil util = new ContentInfoUtil();
        File possiblePdfFile = new File("/path/to/possiblePdfFile.pdf");
        ContentInfo info = util.findMatch(possiblePdfFile);

        log.info( info.toString() );
        assertEquals( ContentType.PDF, info.getContentType() );
    }
Abdull
  • 26,371
  • 26
  • 130
  • 172
2

Well, kind of a hackish solution would be to look at the full file name and see if it ends in ".pdf". The following should help:

import javax.activation.*;  

public class ShowMimeType  
{  
    public static void main(String[] args) {  
        FileDataSource ds = new FileDataSource(args[0]);  
        String contentType = ds.getContentType();  
        System.out.println("The MIME type of the file " + args[0] + " is: " + contentType);  
    }  
}  
awolfe91
  • 1,627
  • 1
  • 11
  • 11
2

If checking the file extension is not satisfactory, you coudl try checking the files magic number by reading a few bytes of the file

PDF files start with "%PDF" (hex 25 50 44 46).
case1352
  • 1,126
  • 1
  • 13
  • 22
0

Combines lighter URLCOnnection.guessContentTypeFromStream() which returns null for some mimeTypes, with heavier AutoDetectParser.

if(currentImageType ==null){
                ByteArrayInputStream is = new ByteArrayInputStream(image);
                String mimeType = URLConnection.guessContentTypeFromStream(is);
                if(mimeType == null){
                    AutoDetectParser parser = new AutoDetectParser();
                    Detector detector = parser.getDetector();
                    Metadata md = new Metadata();
                    mimeType = detector.detect(is,md).toString();

                    if (mimeType.contains("pdf")){
                        mimeType ="pdf";
                    }
                    else if(mimeType.contains("tif")||mimeType.contains("tiff")){
                        mimeType = "tif";
                    }
                }
                if(mimeType.contains("png")){
                    mimeType ="png";
                }
                else if( mimeType.contains("jpg")||mimeType.contains("jpeg")){
                    mimeType = "jpg";
                }
                else if (mimeType.contains("pdf")){
                    mimeType ="pdf";
                }
                else if(mimeType.contains("tif")||mimeType.contains("tiff")){
                    mimeType = "tif";
                }

                currentImageType = ImageType.fromValue(mimeType);
            }
Akin Okegbile
  • 1,108
  • 19
  • 36
0

Tried below code and it worked.

public static boolean isSelectedFilePdf(Uri uri, ContentResolver contentResolver) {
if (uri != null) {
        if (uri.getScheme().equals("content")) {
            String type = contentResolver.getType(uri);
            return type != null && type.startsWith("application/pdf");
        } else {
            String fileName = uri.getLastPathSegment();
            String extension = fileName.substring(fileName.lastIndexOf("."));
            return extension != null && extension.equalsIgnoreCase(".pdf");
        }
    }
}
andro-girl
  • 7,989
  • 22
  • 71
  • 94
0

The following solution is mentioned at Check whether a PDF-File is valid (Python)

In a project if mine I need to check for the mime type of some uploaded file. I simply use the file command like this:

from subprocess import Popen, PIPE
filetype = Popen("/usr/bin/file -b --mime -", shell=True, stdout=PIPE, stdin=PIPE).communicate(file.read(1024))[0].strip()

You of course might want to move the actual command into some configuration file as also command line options vary among operating systems (e.g. mac).

If you just need to know whether it's a PDF or not and do not need to process it anyway I think the file command is a faster solution than a lib. Doing it by hand is of course also possible but the file command gives you maybe more flexibility if you want to check for different types.

caot
  • 3,066
  • 35
  • 37
-1

This might sound a little bit too obvious, but check the extension on the filename.

If it's good enough for explorer, it should be good enough for you

Jacob Schoen
  • 14,034
  • 15
  • 82
  • 102
  • @peshkira well, it's supposed to. Only rarely you can't trust it. – John Dvorak Nov 08 '12 at 20:14
  • 1
    on what grounds do you base your comment. How can you say it is rarely? This depends on the use case. You say it is rarely, because you probably don't do it or don't encounter it, but this doesn't mean it does not happen in a real world scenario. – peshkira Nov 08 '12 at 20:17
  • 2
    I would say it is a bad idea to base design decisions on the way _Microsoft Explorer_ does things.... I think most would agree that Windows is not perfect (and far from it). – jahroy Nov 08 '12 at 20:19