1

I have a backend service in Java which uploads a file to the server.But I seems that some unwanted file types are being uploaded.

For e.g.: If I have a foo.jpg file and rename it to foo.pdf then it gets uploaded .How can I check the actual content of foo.pdf Below is the code that I am using

for (Part part : request.getParts()) {
    if (part.getName().startsWith("file")) {
        String filename = part.getHeader("content-disposition");
        filename = filename.replaceFirst("(?i)^.*filename=\"([^\"]+)\".*$", "$1");
        String fileType = part.getContentType();
        DocumentUpload documentUpload = new DocumentUpload();
        documentUpload.setFilename(filename);
        documentUpload.setFileType(fileType);
        documentUpload.setPayload(part.getInputStream());     
        response = documentService.save(documentUpload, uriInfo);
        break;
    }
}
Zac
  • 1,305
  • 3
  • 17
  • 28
Abx
  • 2,852
  • 4
  • 30
  • 50
  • 2
    You want to check on the server if the bytes you got are really of the indicated file type? – Henry Dec 13 '17 at 07:05
  • 2
    Getting file type from http request means you trust the browser. If you need more control, you may save the file to a temporary and then try to identify it's type by content. There is a number of related questions here, for instance https://stackoverflow.com/questions/9738597/how-to-reliably-detect-file-types – Pak Uula Dec 13 '17 at 07:09

1 Answers1

2

You can use Apache Tika library.

Then you can find the actual MIME type like this:

public String getMimetype(BaseDocument document) {
    ContentHandler contenthandler = new BodyContentHandler();
    Metadata metadata = new Metadata();
    metadata.set(Metadata.RESOURCE_NAME_KEY, document.getName());
    Parser parser = new AutoDetectParser();
    try {
        parser.parse(new ByteArrayInputStream(document.getFile()), contenthandler, metadata, null);
    } catch (IOException | SAXException | TikaException e) {
        //throw
    }

    return metadata.get(Metadata.CONTENT_TYPE);
}

Above the BaseDocument is just an custom object containing information about the document.

Also you can get the actual extension for the file like:

public String getExtension(BaseDocument document) {
    TikaConfig config = TikaConfig.getDefaultConfig();
    MediaType mediaType = null;
    MimeType mimeType = null;
    try {
        mediaType = config.getMimeRepository().detect(new ByteArrayInputStream(document.getFile()), new Metadata());
        mimeType = config.getMimeRepository().forName(mediaType.toString());
    } catch (MimeTypeException | IOException e) {
        //throw;
    }

    return mimeType.getExtension();
}
ddarellis
  • 3,912
  • 3
  • 25
  • 53