-1

I want to get the data from the datatable which is available in this url mentioned.

This is not working for this url only for other url it is working fine.

This is the code for web scraping but the issue is that it is not working that that url.

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class GetData {
   public static void main(String[] args) throws InterruptedException {

      String html = "http://programs.dsireusa.org/system/program";
      try {
         Document doc = Jsoup.connect(html).get();
         Elements tableElements = doc.select("table");

         Elements tableHeaderEles = tableElements.select("thead tr th");
         System.out.println("headers");
         Thread.sleep(5000);
         System.out.println(tableHeaderEles.size());

         for (int i = 0; i < tableHeaderEles.size(); i++) {
            System.out.println(tableHeaderEles.get(i).text());
         }
         System.out.println();

         Elements tableRowElements = tableElements.select(":not(thead) tr");

         for (int i = 0; i < tableRowElements.size(); i++) {
            Element row = tableRowElements.get(i);
            System.out.println("row");
            Elements rowItems = row.select("td");
            for (int j = 0; j < rowItems.size(); j++) {
               System.out.println(rowItems.get(j).text());
            }
            System.out.println();
         }

      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}

I expect the output of all the data available in the datatable of this url this program is working fine for other other url.

http://programs.dsireusa.org/system/program

Vishnu Dasu
  • 533
  • 5
  • 18
  • 4
    What does this have to do with python? – Sayse Jun 06 '19 at 08:42
  • if page use JavaScript to load content then you will have to use ie. `Selenium` to control web browser which can run JavaScript. OR you can use `DevTools` in Chrome/Firefox to find url used by JavaScript to get data from server and then you can get use this url to get data. JavaScript mostly use JSON to get data from server so you don't have to scrape it. – furas Jun 06 '19 at 08:59

1 Answers1

0

The problem is that that url loads its elements after page load (through javascript). If you maybe wait like 2 seconds before you scrape, the page should be loaded

EDIT: You will need to use something other than beautifulSoup, because bs jsut reads everything at page load. You could use selenium to make a real browser that reads the data

DownloadPizza
  • 3,307
  • 1
  • 12
  • 27
  • No, it will not be loaded. There's no JavaScript runtime environment that will load it. See [here](https://stackoverflow.com/questions/7488872/page-content-is-loaded-with-javascript-and-jsoup-doesnt-see-it). – Robby Cornelissen Jun 06 '19 at 08:39
  • @RobbyCornelissen The page is written in angular, so I guess its not exactly javascript. But fact is that part of the website is loaded dynamically – DownloadPizza Jun 06 '19 at 08:41
  • 1
    It is JavaScript, and, yes, in a browser it will be loaded dynamically because the browser loads it. An HTML parser like JSoup will not. – Robby Cornelissen Jun 06 '19 at 08:42
  • @RobbyCornelissen So thats what you meant. I edited my answer accordingly. Sorry for the misunderstanding – DownloadPizza Jun 06 '19 at 08:44