1

I'm trying to do a web scraper for the league of legend's page. I'm trying to first get a list of all champions from here http://gameinfo.euw.leagueoflegends.com/en/game-info/champions/ .

However, I can't figure out why is my loop not working.

Here is my code:

    Document doc = Jsoup.connect("http://gameinfo.euw.leagueoflegends.com/en/game-info/champions").get();
Elements span = doc.select("div#champion-grid-container > div.content-border > div#champion-grid-content > div.rg-box-container rg-display-Riot.champions.GridView > ul > li");
if(span != null){
    System.out.println("The class grid exist!!");
    Elements lista = span.select("li#champion-grid-aatrox");
    if(lista != null){
        System.out.println("li#champion-grid-aatrox Exist!!");
    }else{
        System.out.println("Nop :(");
    }
    Elements maidep = lista.select("div.champ-name");
    if(maidep != null){
        System.out.println("div.champ-name Exist!!");
    }else{
        System.out.println("Nop :(");
    }
    Elements maidep2 = maidep.select("a[href]");
    if(maidep2 != null){
        System.out.println("a Exist!!");
    }else{
        System.out.println("Nop :(");
    }
    for(Element nuj : maidep2)
    System.out.println("Content is " + nuj.text());
}
else {
    System.out.println("Class Grid Nop:(");
}

I know it's a bad practice to select divs like that, at first I tried to go all the way in one select to that particular element, but it was not returning anything so I wanted to go through each div/parent until I got there and see where it gets lost. The output is this:

The class grid exist!!
li#champion-grid-aatrox Exist!!
div.champ-name Exist!!
a Exist!!

So the "Content is" message is not even displayed.

Boombastic
  • 57
  • 7
  • Not directly related, but there is a League of Legends API you can use for this same information - which is usually faster than scraping. See: https://developer.riotgames.com/static-data.html – nbokmans Apr 18 '17 at 12:01
  • is that an empty list? – Exceptyon Apr 18 '17 at 12:03
  • @rmlan I edited the post, I hope it's ok now, nbokmans: I didn't know about that api, however, I would like to keep the scrapper approach for learning purposes. Exceptyon: It may, but why would it be an empty list if the element exists and it has content? – Boombastic Apr 18 '17 at 12:07

1 Answers1

1

According to JSoup doc, Elements.select()

returns the filtered list of elements, or an empty list if none match.

therefore checking if the Element is null is not enough to tell if your selector matched any elements. You should instead be checking if elements.size() > 0, and you're likely to discover that one of your selectors hasn't matched anything.

You can use a this site to try your selectors in real time and save some time.

Bartek Maraszek
  • 1,404
  • 2
  • 14
  • 31
  • Thank you sir, that is really helpful. I've been testing selectors and I've found out that for some reasons all the elements of one particular div can't be accessed. Please look at this image: http://imgur.com/a/Vh9u2 , the div#champion-grid-content was detected, but I can't access the next nested div nor the ul nor the li elements as if they don't exist. Do you know why? – Boombastic Apr 18 '17 at 13:13
  • In the parsed output of the url, it looks like the UL is empty, even though it should have over 100 li elements. – Boombastic Apr 18 '17 at 13:21
  • Probably because the content is dynamically generated by JS on the client side and you're getting the page containing JS code before it was executed. See this question: http://stackoverflow.com/questions/7488872/page-content-is-loaded-with-javascript-and-jsoup-doesnt-see-it – Bartek Maraszek Apr 18 '17 at 14:57
  • One way to resolve the JS code would be to use a Selenium Webdriver. See this question: http://stackoverflow.com/questions/35243516/java-selenium-storing-updated-page-source-after-javascript-activation – Bartek Maraszek Apr 18 '17 at 15:00