0

I have the following code to scrape all the "href" attribute from all elements in the PlayStation webpage:

https://store.playstation.com/#!/es-...s-store%3Ahome

       String url = "https://store.playstation.com/#!/es-es/ps4/cid=STORE-MSF75508-PS4CAT%7Cplatform~ps4%7Cname~asc/";
       String url2 = "?smcid=nav%3Aps-store%3Ahome";
        
       int juegos_totales = 0;
        
        
       ArrayList<String> all_links = new ArrayList<String>();
            
       int z=0;
        
       for (int i=1; i<50; i++) {
           String urlPage = url+i+url2;
            
           System.out.println("Comprobando entrada: " + urlPage);
            
           if (getStatusConnectionCode(urlPage) == 200) {
                
               Document document = getHtmlDocument(urlPage);
                
               Elements entradas = document.select("div.gridViewportPaneWrapper li.cellGridGameStandard");
                                    
               // Paseo cada una de las entradas
                
                
               for (Element elem : entradas) {
                    
                       Elements links = elem.getElementsByTag("a");
                       for (Element link : links ) {
                           all_links.add(link.attr("href"));
                           juegos_totales++;
                            
                       }
                       z++;

                   }
                
               System.out.println("Hay un total de " + juegos_totales + " juegos");
                        
           }
            
       }

It scrapes nothing I don't know why...if I try to scrape the title PS4 it does. This code should scrape all the links of the webpage.

JetLagFox
  • 240
  • 4
  • 10
  • have you checked what's inside `document`? Please check [this answer](http://stackoverflow.com/a/19962060/858913) for more information – eLRuLL Jan 10 '17 at 14:45
  • @eLRuLL Inside the document is all the html `Document document = getHtmlDocument(urlPage);` but the following line is empty `Elements entradas = document.select("div.gridViewportPaneWrapper li.cellGridGameStandard");`. I am using the same code for parsing xbox.com and I don't have any problem, it also has login. – JetLagFox Jan 10 '17 at 15:09
  • have you checked that `document` has all the information of the page you want? [check this answer](http://stackoverflow.com/a/10472154/858913) to read the `document` – eLRuLL Jan 10 '17 at 15:25
  • @eLRuLL You are right the HTML is incomplete. I am reading and trying to solve the problem with the links you have shared, but I'm not able to deal with it. What I don't understand is why is working on xbox.com where there is also a login. – JetLagFox Jan 10 '17 at 17:26

0 Answers0