1

Do you happen to know of an opensource Java component that provides the facility to scan a set of dynamic pages (JSP) and then extract all the input parameters from there. Of course, a crawler would be able to crawl static code and not dynamic code, but my idea here is to extend it to crawl a webserver including all the server-side code. Naturally, I am assuming that the tool will have full access to the crawled webserver and not by using any hacks.

The idea is to build a static analyzer that has the capacity to detect all parameters (request.getParameter() and such) fields from all dynamic pages.

user164701
  • 807
  • 1
  • 6
  • 19

1 Answers1

3

The idea is to build a static analyzer that has the capacity to detect all parameter fields from all dynamic pages.

You cannot use a web crawler (basically, a HTML parser) to extract request parameters. They can at highest scan the HTML structure. You can use for example Jsoup for this:

for (Element form : Jsoup.connect("http://google.com").get().select("form")) {
    System.out.printf("Form found: action=%s, method=%s%n", form.attr("action"), form.attr("method"));
    for (Element input : form.select("input,select,textarea")) {
        System.out.printf("\tInput found: name=%s, value=%s%n", input.attr("name"), input.attr("value"));
    }
}

This prints currently

Form found: action=, method=
    Input found: name=hl, value=en
    Input found: name=source, value=hp
    Input found: name=ie, value=ISO-8859-1
    Input found: name=q, value=
    Input found: name=btnG, value=Google Search
    Input found: name=btnI, value=I'm Feeling Lucky
    Input found: name=, value=
Form found: action=/search, method=
    Input found: name=hl, value=en
    Input found: name=source, value=hp
    Input found: name=ie, value=ISO-8859-1
    Input found: name=q, value=
    Input found: name=btnG, value=Google Search
    Input found: name=btnI, value=I'm Feeling Lucky

If you want to scan the JSP source code for any forms/inputs, then you have to look in a different direction, it's definitely not to be called "web crawler". Unfortunately no such static analysis tool comes to mind. Closest what you can get is to create a Filter which logs all submitted request parameters.

Map<String, String[]> params = request.getParameterMap();
// ...
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Jsoup is in fact a very promising solution. Thank you for your time and for your excellent help! – user164701 Dec 27 '10 at 11:06
  • You're welcome. Jsoup is indeed awesome. See also [pros and cons of leading Java HTML parsers](http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers). – BalusC Dec 27 '10 at 13:03