-1

I want to download few files from a URL. I know the starting of the file name. But the next part would be different. Mostly a date. But it could be different for different files. From Java code, is there any way to download file with matching pattern?

If I hit the below URL in chrome, all the files are listed and I have to download the required files manually.

http://<ip_address>:<port>/MR/build/report/scan/daily/2021-12-13_120/data/

File names can b like below. It will have known file name and date. The date can be different. Either the same as in URL or some older one.

  • scan_report_2021_12_13_120.txt
  • build_report_2021_12_10_110.txt
  • my_reportdata_2021_11_30_110.txt

As of now, my Java code is like below. I have to pass the complete URL with exact file name to download the files. Most of the cases it would be same as the date and number in URL. So in the program I take the date part from URL and add it to my file name nd pass as the URL. But for some files it might change and for those I have to manually download.

private static void downloadFile(String remoteURLPath, String localPath) {
        System.out.println("DownloadFileTest.downloadFile() Downloading from " + remoteURLPath + "  to = " + localPath);
        FileOutputStream fos = null;
        try {
            URL website = new URL(remoteURLPath);
            ReadableByteChannel rbc = Channels.newChannel(website.openStream());
            fos = new FileOutputStream(localPath);
            fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (fos != null) {
                try {
                    fos.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

The argument remoteURLPath is passed like http://<ip_address>:<port>/MR/build/report/scan/daily/2021-12-13_120/data/scan_report_2021_12_13_120.txt And localPath is passed like C:\\MyDir\\MyData\\scan_report_2021_12_13_120.txt

Similarly other files also with date as 2021_12_13_120. Other files wont get downloaded. But will create empty file in the same directory which I will delete later since size is 0.

Is there any way we can pass pattern here?

Like http://<ip_address>:<port>/MR/build/report/scan/daily/2021-12-13_120/data/scan_report_*.txt

And instead of passing complete local path, is there any way to pass only directory where the file should get downloaded with exact same name as in the remote system?

In Linux I can use wget with pattern matching. But was looking for Java way to download in all platforms.

wget -r -np -nH --cut-dirs=10 -A "scan_report*.txt" "http://<ip_address>:<port>/MR/build/report/scan/daily/2021-12-13_120/data/"

iCode
  • 8,892
  • 21
  • 57
  • 91
  • 2
    I suppose if you point your browser to `http://:/MR/build/report/scan/daily/2021-12-13_120/data/` you're getting a directory listing provided by whatever web server software the host is using. `wget` is probably scraping that listing and saving each file it encounters. That means that you'd have to manually implement the same functionality or use an external library, because for sure it's not embedded in `ReadableByteChannel`. See [here](https://stackoverflow.com/questions/17101276/java-download-all-files-and-folders-in-a-directory) for some pointers. – Federico klez Culloca Dec 13 '21 at 07:27
  • @FedericoklezCulloca thanks. I was exactly looking for this. – iCode Dec 15 '21 at 03:24

1 Answers1

0

Thanks to comment from @FedericoklezCulloca. I modified my code using this answer

The solution I did is read all html page and get all href values as it had only the file names with extension. From there I had another list which I used to get the matching files and those I downloaded then using my code in the Question.

Method to get all href list from URL. may be optimisation can be done. Also I did not use any extra library.

private static List<String> getAllHREFListFromURL(String downloadURL) {
    URL url;
    InputStream is = null;
    List<String> hrefListFromURL = new ArrayList<>();
    try {
        url = new URL(downloadURL);

        is = url.openStream();
        byte[] buffer = new byte[1024];
        int bytesRead = -1;
        StringBuilder page = new StringBuilder(1024);
        while ((bytesRead = is.read(buffer)) != -1) {
            String str = new String(buffer, 0, bytesRead);
            page.append(str);
        }

        StringBuilder htmlPage = new StringBuilder(page);
        String search_start = "href=\"";
        String search_end = "\"";
        while (!htmlPage.isEmpty()) {
            int indexOf = htmlPage.indexOf(search_start);
            if (indexOf != -1) {
                String substring = htmlPage.substring(indexOf + search_start.length());
                String linkName = substring.substring(0, substring.indexOf(search_end));
                hrefListFromURL.add(linkName);
                htmlPage = new StringBuilder(substring);
            } else {
                htmlPage = new StringBuilder();
            }
        }
    } catch (MalformedURLException e1) {
        e1.printStackTrace();
    } catch (IOException ex) {
        ex.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (Exception e) {
        }
    }
    return hrefListFromURL;
}

Method to get list of files that I needed.

private static List<String> getDownloadList(List<String> allHREFListFromURL) {
    List<String> filesList = getMyFilesList();

    List<String> downloadList = new ArrayList<>();
    for (String fileName : filesList) {
        Predicate<String> fileFilter = Pattern.compile(fileName + "*").asPredicate();
        List<String> collect = allHREFListFromURL.stream().filter(fileFilter).collect(Collectors.toList());
        downloadList.addAll(collect);
    }
    return downloadList;
}

private static List<String> getMyFilesList() {
    List<String> filesList = new ArrayList<>();
    filesList.add("scan_report");
    filesList.add("build_report");
    filesList.add("my_reportdata");
    return filesList;
}

The downloadList I iterate and uses my original download method to download.

iCode
  • 8,892
  • 21
  • 57
  • 91