0

I have an information to be scraped from a website. I could scrape it. But not all the information is being scraped. There is so much of data loss. The following images helps you further to understand : This is the data which I want to scrape :

I used Jsoup, connected it to URL and then extracted this particular data using the following code :

Document doc = Jsoup.connect("https://www.awattar.com/tariffs/hourly#").userAgent("Mozilla/17.0").get();
Elements durationCycle = doc.select("g.x.axis g.tick text");

But in the result, I couldn't find any of that related information at all. So I printed the whole document from the URL and it shows the following : Scrapped data and full information missing

I could see the information when I download the page and read it as an input file but not when I connect directly to URL. But I want to connect it to URL. Is there any suggestion?

I hope my question is understandable. Let me know in case if it is not explanatory.

Priya
  • 329
  • 3
  • 14
  • 1
    The website is probably running javascript and loading content dynamically. Your browser does execute JS, your scraper does not. – luk2302 Sep 12 '18 at 15:01
  • Thank you so much.. Yes that is true.. Website is running Javascript and loading content dynamically. Is there any possible way how to scrape this? What can I use? – Priya Sep 13 '18 at 07:11
  • you can use Selenium WebDriver as your browser engine and your scraper, or you can use Selenium only as your browser and Jsoup as your scraper. https://www.seleniumhq.org/projects/webdriver/ https://stackoverflow.com/questions/27720839/web-scrapping-with-jsoup-and-selenium – Adi Ohana Apr 22 '19 at 15:23
  • See [this related post](https://stackoverflow.com/q/50189638/8583692). – Mahozad Nov 15 '21 at 16:58

1 Answers1

0

There is a request body limitation in Jsoup. you should use the maxBodySize parameter:

Document doc = Jsoup.connect("https://www.awattar.com/tariffs/hourly#").userAgent("Mozilla/17.0").maxBodySize(0).get();

"0" is no limit.

Adi Ohana
  • 927
  • 2
  • 13
  • 18