-1

From : page to get the names from

Photo of target

I am trying to get the name of the people from their image tags. I am trying to do this using JSOUP. This is what I have thus far:

/**
 * Created by AakarshM on 9/28/2016.
 */


import com.sun.xml.internal.ws.policy.privateutil.PolicyUtils;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.logging.Logger;


public class JSOUPMAIN{

    public static void main(String[] args) throws IOException{


        try {

            String url = "http://www.posh24.com/celebrities";
            Document doc = Jsoup.connect(url).get();
            Elements paragraphs = doc.select("div.channelListEntry");
            for(Element p : paragraphs)
                System.out.println(p.text());

        } catch (IOException e) {


        }


    }

}

This shows me something at the very least, it will give me the name but with additional info. Eg:

4 +12 Zayn Malik

I don't need the extra info, how can I fix this?

  • dont include links in your question, include whatever is relevant in the given space itself. – Rishal Sep 28 '16 at 05:40
  • Is this question fully answered? Then please select the best fitting answer or post a follow up question in the comments (see http://stackoverflow.com/help/someone-answers) – Frederic Klein Sep 30 '16 at 08:47

3 Answers3

1

You should be able to get it from the "alt" attribute. Check this

bnbrkr
  • 11
  • 2
1

Example Code

userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";

Document doc = Jsoup.connect("http://www.posh24.com/celebrities").userAgent(userAgent).timeout(10000).get();

for (Element image : doc.select("#webx_center > div > div > div > a > div.image > img")) {
    System.out.println(image.attr("alt") + "\n\t" + image.attr("abs:src"));
}

Output

Rita Ora
    http://cdn.posh24.com/images/:profile/0a749b802defbf357e7ccf1361ccabef5
Justin Bieber
    http://cdn.posh24.com/images/:profile/081e091efd98b96e82e81a8490a0fb4dd
Rob Kardashian
    http://cdn.posh24.com/images/:profile/083354e61b44581df09f38aaffd5fe901
....

Side-note: see this answer for a short introduction on how to get the css selector: https://stackoverflow.com/a/39632003/1661938

Community
  • 1
  • 1
Frederic Klein
  • 2,846
  • 3
  • 21
  • 37
0

Try doc.select("div.channelListEntry div.name");

Antoniossss
  • 31,590
  • 6
  • 57
  • 99