I need to write a Java client application which, when given the below URL, will enumerate the directories/files recursively beneath it. I also need to get the last modified timestamp for each since I'm only concerned with changes since a known timestamp.
http://www.myserver.com/testproduct/
For example, suppose the following exist on the server.
http://www.myserver.com/testproduct/red/file1.txt
http://www.myserver.com/testproduct/red/file2.txt
http://www.myserver.com/testproduct/red/black/file3.txt
http://www.myserver.com/testproduct/red/black/file4.txt
http://www.myserver.com/testproduct/orange/anotherfile.html
http://www.myserver.com/testproduct/orange/mymovie.avi
http://www.myserver.com/testproduct/readme.txt
I need to, starting at the specified URL (http://www.myserver.com/testproduct/) enumerate the directories and files recursively beneath it along with the last modified timestamp of each. Once I have the list of directories/files, I'll be selectively downloading some of the files based on timestamp and other client-side filters.
The server is running Apache and is configured to allow directory listing.
I did some experimentation using Apache's HttpClient Java class and when I request the contents of http://www.myserver.com/testproduct/ I get back an HTML file which of course is the same thing you see if you go there in your browser. Its an HTML page showing the contents of the folder.
Is this the only way to do it? i.e. scraping the resulting HTML page to parse out the files and directories? Also, I'm not sure I can reliably distinguish files from directories based on the HTML returned
Is there a better way to enumerate directories and files without page scraping the resultant HTML?