1

I have some files which will be either email attachments or zip attachments. which means that I have stream of the file instead of file or its actual path. I need to get the created date time and last modified date time of the file using the InputStream of the file. I tried Metadata from apache tika, It's not giving me these two things, however I can see these two properties in the file properties. Also I'm able to get the created date time and modified date time of the same files using BasicFileAttribute. But the BasicFileAttribute will work on the file path and won't work with the stream of the file. consider the scenario below -

I have a file say myTestFile.txt for this file, I can see the createdDateTime and modifiedDateTime in the file properties. and I'm able to get these two data using BasicFileAttribute. But for the same file, when I'm using Apache tike Metadata to parse with the stream of the file to get the createdDateTime and lastmodifiedDateTime, It's not giving me any of the two dates.

I need to get the solution for createdDateTime and lastModifiedDateTime with the stream instead of the file or filepath because in the production environment, I'll only have the stream and not the actual file or the file path.

Thanks

  • Do you want to know when the file was first created, or when it was first put on your machine? – Gagravarr Oct 18 '22 at 19:11
  • I need to know when it was created and last modified. – Ruhul Hussain Oct 18 '22 at 21:16
  • On your machine, or on the machine where it was created? – Gagravarr Oct 19 '22 at 08:16
  • on the machine where it was created. See since I am doing robocopy so I'll have the created, modified and accessed date time same as the machine. I can see these dates when I'm seeing the properties of the file, I am also able to get the same when I'm using FileAttributes. But I'm not able to get these dates when I'm trying to read the metadata using tika Metadata class (which actually works on inputstream). and in real time scenario, I'll not have the file directly, I'll have the file embedded in zip or in email attachments which means that the stream of the file but not the file. – Ruhul Hussain Oct 20 '22 at 14:18

1 Answers1

1

I got the solution. I was parsing the inputstream of the file to extract the metadata of file in Metadata class using Parser class, which was returning creation date time and last modified date time as null for few files.

However when I tried parsing the inputstream of the file using Tika class instead of Parser class (both are the classes from apache tika), that worked for me and I'm able to get all the metadata now.

Below code was my older approach, which wasn't giving me created, last modified date time.

public void fetchMetaData(InputStream inputStream) {
    BodyContentHandler handler = new BodyContentHandler();
    Metadata metadata = new Metadata();
    ParseContext pcontext = new ParseContext();

    try {
        Parser parser = new AutoDetectParser();
        parser.parse(inputStream, handler, metadata, pcontext);
        System.out.println("creation date from metadata " + metadata.get("dcterms:created"));
        System.out.println("modified date from metadata " + metadata.get("dcterms:modified"));
        //Below loop will get all the metadata keys available in the metadata and will print the values assigned to these keys
        for (String key : metadata.names()) {
            System.out.println(key + " = " + metadata.get(key));
        }
    } catch (TikaException | SAXException | IOException ex) {
        ex.printStackTrace();
    }
}

and below is the solution that worked.

public void fetchMetaData(InputStream inputStream){
    try {
        Tika tika = new Tika();
        Metadata metadata = new Metadata();
        tika.parse(inputStream, metadata);
        System.out.println("creation date from metadata "+metadata.get("dcterms:created"));  //created date time
        System.out.println("modified date from metadata "+metadata.get("dcterms:modified")); //last modified date time
        
        for(String key : metadata.names())
            System.out.println(key+" = "+metadata.get(key));
    } catch (IOException ex) {
        ex.printStackTrace();
    }

}