I am working with text files, images and documents (.log, .txt, .pdf, .doc, .docx, .jpeg, .jpg, .png, .tiff etc.).I need to get some metadata from files based on their content not from extensions. So, my questions are:
Q1. How can I differentiate b/w these category of files (plain text files, text documents(.docx), pdfs, images) using Java?
Q2. Any library in Java that would be helpful in this process?
Q3. Are pdfs containing scanned images and pdfs containing texts are different in terms of any properties or anything for that matter?
PS: I don't have much expertise on this, so kindly correct me if I am wrong in my questionnaire.