0

I'm trying to get the text in the span

enter image description here

using this code below. However the output is behaving as if the nested spans don't exist

            Elements tags = document.select("div[id=tags]"); 

            for (Element tag:tags){


                Elements child_tags = tag.getElementsByTag("class");  

                String key = tag.html();
                System.out.println(key); //only as a test

                for (Element child_tag:child_tags){
                    System.out.println("\t" + child_tag.text());

                }

My output is

      <hr />Tags: 
      <span id="category"></span> 
      <span id="voteSelector" class="initially_hidden"> <br /> </span>      
Rishal
  • 1,480
  • 1
  • 11
  • 19
A. Napster
  • 183
  • 2
  • 7
  • Use this selecter `span[class=tag]` instead of `div[id=tags]`, you will have your results. – Rishal Nov 22 '17 at 13:17
  • It doesn't work. Try extracting the tags " Hanging Piece", "Unsound Sacrifice" in https://chesstempo.com/chess-problems/15 which is the problem I'm actually working on. – A. Napster Nov 22 '17 at 13:29
  • 1
    Please copy the relevant html snippet and paste it as text instead of image link. – Eritrean Nov 22 '17 at 13:37

2 Answers2

1
Elements child_tags = tag.getElementsByTag("class");

With this line you are trying to get an element with tag class i.e <class>...</class>, which dose not exist. Change that line to:

Elements child_tags = tag.getElementsByClass("tag");

to get elements by attribute value of class = tag or to:

Elements child_tags = tag.getElementsByTag("span"); 

to get elements by tag name = span.

Eritrean
  • 15,851
  • 3
  • 22
  • 28
  • Both don't work. Have you tested it before you posted your answer? Try extracting the tags " Hanging Piece", "Unsound Sacrifice" in http://chesstempo.com/chess-problems/15 to see if your method works – A. Napster Nov 22 '17 at 14:50
1

Assuming you are trying the code on https://chesstempo.com/chess-problems/15 and the data you want is shown in the below image enter image description here

Now, Using Jsoup you will get the data whatever is rendered as a source code in the browser,for confirmation you can press CTRL+U in browser which will open up a new window where the actual contents which Jsoup will get will be displayed. Now coming to your questions the part which you are trying to retrieve itself is not present in the browser source code check that by pressing CTRL+U.

If the contents are rendered using JAVASCRIPT those will not be visible to JSOUP and hence you have to use something else which will run the javascript and provide you the details.

JSoup does not run Javascript and is not a browser.

EDIT

There is a turnaround using SELENIUM. Below is the working code to get the exact source code of the url and the required data which you are looking for:

import java.io.IOException;
import java.io.PrintWriter;

import org.json.simple.parser.ParseException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class JsoupDummy {
 public static void main(String[] args) throws IOException, ParseException {
    System.setProperty("webdriver.gecko.driver", "D:\\thirdPartyApis\\geckodriver-v0.19.1-win32\\geckodriver.exe");
    WebDriver driver = new FirefoxDriver();

    try {
        driver.get("https://chesstempo.com/chess-problems/15");
        Document doc = Jsoup.parse(driver.getPageSource());
        Elements elements = doc.select("span.ct-active-tag");
        for (Element element:elements){
             System.out.println(element.html());
        }

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        /*write.flush();
        write.close();*/
        driver.quit();

    }
}
}

You need selenium web driver Selenium Web Driver which simulates the browser behaviour and allows you to render the html content written by scripts as well.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Rishal
  • 1,480
  • 1
  • 11
  • 19
  • Do you have an idea what I could use to get the tags? I used the package HtmlUnit which works randomly. Sometimes I can retrieve the tags and sometimes I cannot... – A. Napster Nov 26 '17 at 07:03
  • Man you are awesome! it worked! I think the query with select and .html() helped in this. – Ali Obeid Apr 06 '19 at 17:04