0

I'm trying to parse Groovydoc, but Jsoup doesn't find the frameset in which everything is contained.

        Connection connection=Jsoup.connect('http://groovy-lang.org/api.html')
        Document document=connection.get()
        Elements element= document.getElementsByTag('frameset')
        element.each {println(it)}
Alexiy
  • 1,966
  • 16
  • 18

1 Answers1

0

If you check the result that is returned by connection.get() you can see that there is no frameset tag:

println document

Now, if you open the site in a browser and use development tools to look at it's html code you can see that the frameset you are looking for is a child of an iframe from source http://docs.groovy-lang.org/latest/html/gapi.

Just load the iframe url with Jsoup to get the frameset

Connection connection = Jsoup.connect('http://docs.groovy-lang.org/latest/html/gapi')
Document document = connection.get()
Elements element = document.getElementsByTag('frameset')
element.each { println it }

Or if you do not want to hardcode the iframe source url to parse, look at this SO answer on how to get the source url

Community
  • 1
  • 1
Gergely Toth
  • 6,638
  • 2
  • 38
  • 40