0

How to sort Apache source archives names (strings). I tried with below code with Jsoup, but not return the expected result given. How to solve this problem?

public static void getApacheArchives() throws IOException{
    String url = "https://archive.apache.org/dist/httpd/"; // or whatever goes here
    Document document = Jsoup.connect(url).followRedirects(false).timeout(60000/*wait up to 60 sec for response*/).get();

    Elements anchors = document.body().getAllElements().select("a");

    Collections.sort(anchors, new Comparator<Element>() {
        @Override
        public int compare(Element e1, Element e2) {
            return e1.text().compareTo(e2.text());
        }
    });

    for (int i = 0; i < anchors.size(); i++) {
        Element a = anchors.get(i);
        if ( 
                ( a.text().matches( "(apache_)[1].[0-9].[0-9]{1,2}.(tar.gz)" ) )
                ||
                ( a.text().matches( "(httpd-)[0-9]{1,2}.[0-9]{1,2}.[0-9]{1,2}.(tar.gz)") )
            ){
            System.out.println(a.text());
        }
    }
}

This code return below result:
...
httpd-2.3.6.tar.gz
httpd-2.3.8.tar.gz
httpd-2.4.1.tar.gz
httpd-2.4.10.tar.gz
httpd-2.4.12.tar.gz
httpd-2.4.16.tar.gz
httpd-2.4.17.tar.gz
httpd-2.4.18.tar.gz
httpd-2.4.2.tar.gz
httpd-2.4.20.tar.gz
httpd-2.4.3.tar.gz
httpd-2.4.4.tar.gz
httpd-2.4.6.tar.gz
httpd-2.4.7.tar.gz
httpd-2.4.9.tar.gz
...

But the expected result is below:
...
httpd-2.3.6.tar.gz
httpd-2.3.8.tar.gz
httpd-2.4.1.tar.gz
httpd-2.4.2.tar.gz
httpd-2.4.3.tar.gz
httpd-2.4.4.tar.gz
httpd-2.4.6.tar.gz
httpd-2.4.7.tar.gz
httpd-2.4.9.tar.gz
httpd-2.4.10.tar.gz
httpd-2.4.12.tar.gz
httpd-2.4.16.tar.gz
httpd-2.4.17.tar.gz
httpd-2.4.18.tar.gz
httpd-2.4.20.tar.gz
...

adampweb
  • 1,135
  • 1
  • 9
  • 19
  • What's the question? The output is exactly as expected, ".10" is sorted before ".2" because that's their order as strings. Nothing in the code tries to deal with that. You'll need to write code to parse the string into parts (probably with a regular expression), compare the string parts as strings, each individual number part as number. So try to do that, and if you run into trouble, ask a specific question about your code trying to do it. – T.J. Crowder Jun 25 '16 at 10:34
  • you need to extract the numbers in the anchors and compare with the comparator, what i see here is comparing the text that is a mixture – Seek Addo Jun 25 '16 at 10:35

1 Answers1

0

Thanks Tom, I found the solution to my problem! Sorting Strings that contains number in Java

some modification:

Collections.sort(anchors, new Comparator<Element>() {
        public int compare(Element o1, Element o2) {
            return extractInt(o1.text()) - extractInt(o2.text());
        }

        int extractInt(String s) {
            String num = s.replaceAll("\\D", "");
            // return 0 if no digits found
            return num.isEmpty() ? 0 : Integer.parseInt(num);
        }
    });
Community
  • 1
  • 1
adampweb
  • 1,135
  • 1
  • 9
  • 19