0

URL: https://stats.nba.com/player/1628381/defense-dash/

Attempting to get:

 `<table>
  <tbody>
    <!----><tr data-ng-repeat="(i, row) in page" index="0">
      <td class="player">Overall</td>
      <td>45</td>
      <td>45</td>
      <td>5.7</td>
      <td>12.3</td>
      <td>46.6</td>
      <td>100%</td>
      <td>46.7</td>
      <td>-0.1</td>
    </tr><!---->
  </tbody>
</table> `

My coding:

 public static void getData(String url, String Name, int ID) throws 
IOException
{
    String html = Jsoup.connect(url).execute().body();
    html = html.replaceAll("<!---->", "");
    html = html.replaceAll("<!--", "");
    html = html.replaceAll("-->", "");
    Document doc = Jsoup.parse(html);
    Elements tableElements = doc.select("table");
    
    System.out.println("Elements " + tableElements);
    
    for (Element tableElement : tableElements)
    {
        String tableId = tableElement.id();
        if (tableId.isEmpty()) {
            continue;
    }
        String fileName = "table" + Name + tableId + ID + ".csv";
        System.out.println(fileName);
        FileWriter writer = new FileWriter(new File("C:\\Users\\noman\\eclipse-workspace\\Senior Project\\src\\", fileName));

        //System.out.println(doc);
        Elements tableRowElements = tableElement.select(":not(thead) tr td");

        for (int i = 0; i < tableRowElements.size(); i++) {
            Element row = tableRowElements.get(i);
            Elements rowItems = row.select("td");
            for (int j = 0; j < rowItems.size(); j++) {
                writer.append(rowItems.get(j).text());

                if (j != rowItems.size() - 1) {
                    writer.append(',');
                }
            }
            writer.append('\n');
        }

Problem is no elements are being found. this same code works on another site perfectly which (seemingly) no differences in how they store data

Is there something different with this website preventing web-scraping? or a subtle difference maybe?

Please note HTML code provided is a shorten version

Novabomb
  • 99
  • 1
  • 10
  • https://stackoverflow.com/questions/35586658/how-to-access-updated-html-source-after-the-javascript-on-the-page-has-been-exec – mavriksc Feb 27 '19 at 21:56
  • Using my browsers debugger (Network tab) I checked the data is dynamically loaded from this URL: https://stats.nba.com/stats/playerdashptshotdefend?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&Period=0&PlayerID=1628381&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision= Jsoup will not parse json so you have to use different library. – Krystian G Feb 28 '19 at 00:08
  • https://stackoverflow.com/tags/jsoup/info – Luk Feb 28 '19 at 11:22