As titled, I'm trying to read the content of sites like this one, which appears to be javascript based.
I tried using plain jdk lib, then jsoup
and then htmlunit
, but I couldn't get anything useful out of it (I see just the source code or just the title or null
):
val url = URL("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate")
val connection = url.openConnection()
val scanner = Scanner(connection.getInputStream())
scanner.useDelimiter("\\Z")
val content = scanner.next()
scanner.close()
println(content)
val doc = Jsoup.connect("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate").get()
println(doc.text())
WebClient().use { webClient ->
val page = webClient.getPage<HtmlPage>("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate")
val pageAsText = page.asNormalizedText()
println(pageAsText)
}
WebClient(BrowserVersion.FIREFOX).use { webClient ->
val page = webClient.getPage<HtmlPage>("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate")
println(page.textContent)
}
It should be something easy peasy, but I cant see what's wrong