2

I am trying to parse this content using jsoup.

<div class="imageInlineCenter" style="width: 468px;" align="center"><img src="http://xbox360media.ign.com/xbox360/image/article/117/1171345/MW3_3_468_1306710207.jpg" align="middle" border="0" height="263" width="468"><div class="inlineImageCaption" style="width: 468px;">Your subwoofer will get a break during the stealthy start of the 'Mind the Gap' level, but only briefly.</div></div>

I only want to parse the img src tag to get the image url.

Here's what I am working with right now..

  try{
                  Elements img = jsDoc.select("div.imageInlineCenter");
                  String imgSrc = img.attr("img src");
                  System.out.println(imgSrc);



                 }
                 catch(Exception e){

                     Log.e("UPCOMING", "Couldnt retrieve the text");
                           }

Nothing is being printed out. Instead i am getting the message that it couldnt retrieve it.

How can i parse this?

EDIT:

Here is the code I am using.

It's not showing the catch message, or the system.out.

   try {
                 jsDoc = Jsoup.connect(url).get();

                  try{
                      Elements img = jsDoc.select("div.imageInlineCenter img[src]");
                      String imgSrc = img.attr("src");
                      System.out.println(imgSrc);





                     }
                     catch(Exception e){

                         Log.e("UPCOMING", "Couldnt retrieve the text");
                               }
Rahal Kanishka
  • 720
  • 13
  • 27
coder_For_Life22
  • 26,645
  • 20
  • 86
  • 118

1 Answers1

6

This is wrong:

String imgSrc = img.attr("img src");

img is a tag not an attribute. src is an attribute of course.

Can't test it right now, but what about something like...

Elements img = jsDoc.select("div.imageInlineCenter img[src]");
String imgSrc = img.attr("src");
System.out.println(imgSrc);

Edit 1
Regarding "it didn't seem to work...": it seemed to work fine for me. How are you testing this?

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class Foo003 {
   private static final String TEST_URL_1 = "http://xbox360.ign.com/" +
        "articles/117/1171345p1.html";

   public static void main(String[] args) {
      Document jsDoc = null;

      try {
         jsDoc = Jsoup.connect(TEST_URL_1).get();
         // System.out.println(jsDoc);

         Elements img = jsDoc.select("div.imageInlineCenter img[src]");
         String imgSrc = img.attr("src");
         System.out.println(imgSrc);

      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}
Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
  • Again, what web page are you testing? Are you sure that you're checking for the right things? – Hovercraft Full Of Eels Sep 19 '11 at 02:08
  • http://xbox360.ign.com/articles/117/1171345p1.html Ive managed to parse the article-content. Now i just want to get the images in the article also. – coder_For_Life22 Sep 19 '11 at 02:10
  • Ive also parsed correclt for the title =). Also how can i parse for the Game Detail box on the right hand side? – coder_For_Life22 Sep 19 '11 at 02:10
  • 1
    @coder_for_life: see edit regarding "it didn't seem to work". Next time, rather than making statements like this which tell us nothing, show us how you're testing it and what if any error messages are shown. – Hovercraft Full Of Eels Sep 19 '11 at 02:16
  • Hi, i know this post is old but i need an help.. I'm trying to parsing the images in this webpage for each article http://www.multiplayer.it but i can't do it.. Now, using your code, @HovercraftFullOfEels, and in my case: `Elements nodeBlogStats = doc.select("img[src~=(?i)\\.(jpe?g)]");` and `titoli.add(sezione.attr("src"));` i can display the url of each image.. but not the image itself.. This is my question: http://stackoverflow.com/questions/21181685/how-parse-image-with-jsoup?noredirect=1#comment31890231_21181685 can you help me? Please – David_D Jan 19 '14 at 11:55