0

I am currently working on a Spring-Boot application which is able to receive files' data encoded in Base64, and send them to a recipient user.

I upload the following data:

{
    "fileName":"excelfile",
    "fileExtension":"xlsx",
    "fileData":"<base64 string>"
}

I want to validate the uploaded data, so I need to make sure if the base64 string actually decodes into an xlsx Excel Spreadsheet file. Until now I tried my best with URLConnection and Apache Tika, however Tika could only identify through the file name whether the application/vnd.openxmlformats-officedocument.xxxx is actually an Excel Spreadsheet document or not.

With so much effort a user could simply upload a base64 encoded Word document with an extension of xlsx, confusing the application.

For my application it would be enough to extract the file extension form the base64 sting. Are there utility libraries for such tasks? I do know that the public tool Base64.Guru has it's way to determine the openxmlformats-officedocument's actual subtype and extension, so I don't see it an impossible task.

Anish B.
  • 9,111
  • 3
  • 21
  • 41
MEZesUBI
  • 297
  • 7
  • 17
  • does [this](https://stackoverflow.com/questions/25763533/how-to-identify-file-type-by-base64-encoded-string-of-a-image) helps? – Ankur Singhal Jul 05 '19 at 15:21
  • @AnkurSinghal unfortunately it does not help as image types being non-application types get identified easily. – MEZesUBI Jul 06 '19 at 19:16
  • How are you calling Tika? If you don't give it a file name, it will read the bytes for file detection. You'll need to have the tika-parser package in your path to get fine-grained ooxml detection. – Tim Allison Jul 18 '19 at 19:49
  • I actually ended up creating a map for all Mimetypes and their respective file extensions. But initially I used a default TikaConfig object, a TikaMetadata object with the filename supplied, then used a Detector from TikaConfig. – MEZesUBI Jul 22 '19 at 07:20

0 Answers0