How to extract multiple values from html to java?

Question

I'm trying to extract some data from html source code to my java project. The html is taken from "Bing search images" and I wanna get all the images from the <a> tag. This is the html code:

<a href="/images/search?q=nba&amp;view=detailv2&amp;&amp;&amp;
id=FE19E7BB2916CE8B6CD78148F3BC0656D151049A&amp;
selectedIndex=3&amp;
ccid=2%2f7OBkGc&amp;
simid=608035681734625885&amp;
thid=JN.tdPCsRj4HyJzbwA%2bgXsS8g" 
ihk="JN.tdPCsRj4HyJzbwA+gXsS8g" 
m="{ns:&quot;images&quot;,k:&quot;5070&quot;,dirovr:&quot;ltr&quot;,
mid:&quot;FE19E7BB2916CE8B6CD78148F3BC0656D151049A&quot;,
surl:&quot;http://www.nba.com/gallery/rookie/070727_1.html&quot;,
imgurl:&quot;http://www.nba.com/media/draft_class_3_07_070727.jpg
&quot;,
ow:&quot;300&quot;,docid:&quot;608035681734625885&quot;,oh:&quot;192&quot;,tft:&quot;58&quot;}" 
mid="FE19E7BB2916CE8B6CD78148F3BC0656D151049A" 
t1="The 2007 NBA Draft Class" 
t2="625 x 400 · 374 kB · jpeg" 
t3="www.nba.com/gallery/rookie/070727_1.html" 
h="ID=images,5070.1"><img data-bm="16" 
src="https://tse3.mm.bing.net/th?id=JN.tdPCsRj4HyJzbwA%2bgXsS8g&amp;w=217&amp;h=142&amp;c=7&amp;rs=1&amp;qlt=90&amp;o=4&amp;pid=1.1" 
style="width:217px;height:142px;" width="217" height="142">
</a>

and this is how i tried to extract it but no succeeded:

public static void main(String[] args) {

        String title = "dog";
        String url =    "https://www.bing.com/images/search?q="+title+"&FORM=HDRSC2";
        try {
            Document doc = Jsoup.connect(url).get();
            Elements img = doc.getElementsByTag("a");

            for (Element el : img) {
                String src1 = el.absUrl("imgurl");
                String src2 = el.absUrl("surl");
                System.out.println(src1 + " " + src2);      
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Any idea if it's possible?

at which point you do not succeeded? `Jsonp.connect` or inside `for loop`?? Or ? — pmverma, Jun 22 '15 at 12:52
the connection working fine but no result on the `String src1 = el.absUrl("imgurl"); String src2 = el.absUrl("surl");` — matt matt, Jun 22 '15 at 12:54
Are you using any IDE to develop? If so try to debug and find the correct expression. — pmverma, Jun 22 '15 at 12:56
Look at the HTML code . you can tell me how can i get the "imgurl" and the "surl" values? — matt matt, Jun 22 '15 at 12:58
`don't know HTML just Java`? You use web everyday, at least you should be familiar with HTML. — pmverma, Jun 22 '15 at 13:02

score 1 · Answer 1 · answered Jun 22 '15 at 13:05

1

As far as I understand your <a> element has attribute m, not imgurl or surl, and that m contains a JSON which in turn contains imgurl and surl. So you should extract JSON from m:

String m = el.attr("m");

And then parse that m as a JSON, using any library you like, e.g. GSON:

class MJson {
    private String imgurl;
    private String surl;

    ...
}

MJson mJson = new Gson().fromJson(m, MJson.class);
String src1 = mJson.getImgurl();
String src2 = mJson.getSurl();

answered Jun 22 '15 at 13:05

user3707125

3,394
14
23

Wow thanks! but what sould i need to write insted the `...` please – matt matt Jun 22 '15 at 13:11
@mattmatt getters and setters – user3707125 Jun 22 '15 at 13:11
I'm still get `@null` – matt matt Jun 22 '15 at 13:24
@mattmatt, at which place? – user3707125 Jun 22 '15 at 13:36
src1 and src2. it's look like the Gson not getting data at all – matt matt Jun 23 '15 at 04:15

How to extract multiple values from html to java?

1 Answers1