-1

In few websites few scripts might take some time to run which results the website scraping to work inefficiently or the html which is returned from the scraper is incomplete.How to scrape the website once the site scripts are fully ran.

I am using URL Connection in java when I am reading the text from it I am getting HTML which is pre matured (i.e) I have script which is a bit long which takes some time to load which changes color of the text which is not reflecting in the text which is read using URL CONNECTION.

Bharadwaj
  • 135
  • 1
  • 1
  • 7

2 Answers2

0

You can use PhantomJS. It's a browser but headless. It will render all js on the page. You might find this thread useful Any Java equivalent to PhantomJS?

quazar
  • 570
  • 4
  • 14
0

I have used Selenium in java (and kotlin using the java libarary) to do website automation and testing it can be set up to wait a specified time before looking for elements or wait until it is loaded, since it really just remote controls a webbrowser you can use javascript on pages and act just like any user would

https://www.seleniumhq.org/download/

https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java

RemoteWebDriver driver = new ChromeDriver()
driver.get(url)
driver.findElement(by.name("search")).sendKeys("some query")
driver.find(by.id("submit")).click()

you can wait for all things to load as described here https://stackoverflow.com/a/33349203/9006779 (or at least in a similar way, the api might have changed)

Nikky
  • 498
  • 2
  • 9