0

I'm trying to extract some data from html source code to my java project. The html is taken from "Bing search images" and I wanna get all the images from the <a> tag. This is the html code:

<a href="/images/search?q=nba&amp;view=detailv2&amp;&amp;&amp;
id=FE19E7BB2916CE8B6CD78148F3BC0656D151049A&amp;
selectedIndex=3&amp;
ccid=2%2f7OBkGc&amp;
simid=608035681734625885&amp;
thid=JN.tdPCsRj4HyJzbwA%2bgXsS8g" 
ihk="JN.tdPCsRj4HyJzbwA+gXsS8g" 
m="{ns:&quot;images&quot;,k:&quot;5070&quot;,dirovr:&quot;ltr&quot;,
mid:&quot;FE19E7BB2916CE8B6CD78148F3BC0656D151049A&quot;,
surl:&quot;http://www.nba.com/gallery/rookie/070727_1.html&quot;,
imgurl:&quot;http://www.nba.com/media/draft_class_3_07_070727.jpg
&quot;,
ow:&quot;300&quot;,docid:&quot;608035681734625885&quot;,oh:&quot;192&quot;,tft:&quot;58&quot;}" 
mid="FE19E7BB2916CE8B6CD78148F3BC0656D151049A" 
t1="The 2007 NBA Draft Class" 
t2="625 x 400 · 374 kB · jpeg" 
t3="www.nba.com/gallery/rookie/070727_1.html" 
h="ID=images,5070.1"><img data-bm="16" 
src="https://tse3.mm.bing.net/th?id=JN.tdPCsRj4HyJzbwA%2bgXsS8g&amp;w=217&amp;h=142&amp;c=7&amp;rs=1&amp;qlt=90&amp;o=4&amp;pid=1.1" 
style="width:217px;height:142px;" width="217" height="142">
</a>

and this is how i tried to extract it but no succeeded:

public static void main(String[] args) {

        String title = "dog";
        String url =    "https://www.bing.com/images/search?q="+title+"&FORM=HDRSC2";
        try {
            Document doc = Jsoup.connect(url).get();
            Elements img = doc.getElementsByTag("a");

            for (Element el : img) {
                String src1 = el.absUrl("imgurl");
                String src2 = el.absUrl("surl");
                System.out.println(src1 + " " + src2);      
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Any idea if it's possible?

Pshemo
  • 122,468
  • 25
  • 185
  • 269
matt matt
  • 205
  • 2
  • 11

1 Answers1

1

As far as I understand your <a> element has attribute m, not imgurl or surl, and that m contains a JSON which in turn contains imgurl and surl. So you should extract JSON from m:

String m = el.attr("m");

And then parse that m as a JSON, using any library you like, e.g. GSON:

class MJson {
    private String imgurl;
    private String surl;

    ...
}

MJson mJson = new Gson().fromJson(m, MJson.class);
String src1 = mJson.getImgurl();
String src2 = mJson.getSurl();
user3707125
  • 3,394
  • 14
  • 23