0

I am trying to retrieve the image from this html data:

<div class="image">
 <a href="http://www.website.com/en/105/News/10217/">
   <img src="/images/cache/105x110/crop/images%7Ccms-image-000005554.gif" 
        width="105" height="110" alt="kollsge (photo: author)" />
 </a>
</div>

This is my code:

HTMLNode *bodyNode = [parser body];

NSArray *imageNodes = [bodyNode findChildTags:@"div"];

for (HTMLNode *imageNode in imageNodes) {
    if ([[imageNode getAttributeNamed:@"class"] isEqualToString:@"image"]) {
        NSLog(@"%@", [imageNode getAttributeNamed:@"img src"]);
    } 
}

Help would be much appreciated.

I solved it by this code:

 for (HTMLNode *imageNode in imageNodes) {
        if ([[imageNode getAttributeNamed:@"class"] isEqualToString:@"image"]) {
            HTMLNode *aNode = [imageNode firstChild];
            HTMLNode *imgNode = [aNode nextSibling];
            HTMLNode *imNode = [imgNode firstChild];
            NSLog(@"%@", [imNode getAttributeNamed:@"src"]);
        } 
    }
Benjamen
  • 609
  • 1
  • 5
  • 13
  • Have you tried debugging with step through? – Cole Tobin Jun 08 '12 at 22:43
  • Is this for a plugin? Are you after only the image's URL/`href` attribute value? Why don't you attach it to a DOM node and extract it from there? EDIT: Or maybe it's Objective-C for iOS? – Jared Farrish Jun 08 '12 at 22:43
  • I am just trying to get the url of the image to put in the cell. I managed to put title and content of the data but can't do anything with photos.. – Benjamen Jun 08 '12 at 22:45
  • You're not giving enough information; what is the problem? Do you understand the code? Is this line: `NSLog(@"%@", [imageNode getAttributeNamed:@"img src"]);` logging the `img src` value? Are you trying to get the `img src` value into a variable and you don't know how? What cell? You gotta explain what's going on and what you are after in enough detail so we're not all left guessing. – Jared Farrish Jun 08 '12 at 23:02
  • I assume you're using libxml2, which would be useful information. I would guess that getAttributeNamed:@"img src" is not the correct way to get that piece of data. You need to get the img node then get its src attribute. – podperson Jun 08 '12 at 23:06
  • Yes, I am trying to see img src value in the console. Afterwords I know how to deal with it. Thanks. – Benjamen Jun 08 '12 at 23:07
  • And you're not seeing the `src` value in the log? Is that the problem? Have you tried `[imageNode getAttributeNamed:@"src"]` instead of `[imageNode getAttributeNamed:@"img src"]`? Note, I removed the `img` from the first. – Jared Farrish Jun 08 '12 at 23:08
  • I guess you're not going to volunteer any potentially helpful information, such as errors, or whatnot. Last guess I have is that `getAttributeNamed:@"img src"` maybe should be `getAttributeName:@"src"` or `getAttributeName:@"img src"`. See: http://stackoverflow.com/questions/3028759/objective-c-passing-a-variable-to-another-ibaction – Jared Farrish Jun 08 '12 at 23:20
  • I am using Ben Reeves' HTMLParser. It worked great with regard to any texts but I cannot figure out how to deal with photos. It just says null. There are no errors. https://github.com/zootreeves/Objective-C-HMTL-Parser – Benjamen Jun 09 '12 at 01:38
  • Git is cool and all, but their code system seriously annoys me because it's so hard to search. If you know a technique, I'd appreciate the tip. I searched for a good thirty minutes straight for those functions, never found that project. Jeez. – Jared Farrish Jun 09 '12 at 03:52
  • I never search anything on Git, because I have the same problem with finding things. I just end up there through searching on other websites. – Benjamen Jun 09 '12 at 04:04

1 Answers1

3

You are not going through the tree correctly. You are attempting to find an attribute named img src on your div. That would look like this:

<div class="image" img src="whatever">

For one thing, that's not valid HTML, but the more important issue is that you want to be looking at the children. The thing you are looking for is nested inside the div, not an attribute. Since your div only has one child, a quick look at the project you provided in the comments leads me to believe that the following will work:

HTMLNode *bodyNode = [parser body];

NSArray *imageNodes = [bodyNode findChildTags:@"div"];

for (HTMLNode *imageNode in imageNodes) {
    if ([[imageNode getAttributeNamed:@"class"] isEqualToString:@"image"]) {
        HTMLNode *aNode = [imageNode firstChild];
        HTMLNode *imgNode = [aNode nextSibling];
        NSLog(@"%@", [imgNode getAttributeNamed:@"src"]);
    } 
}
borrrden
  • 33,256
  • 8
  • 74
  • 109
  • On which line ? Saying it has a bad access code is the same as simply saying "It didn't work" I'm sure my theory is correct though, you need the children, and not the attribute of div. – borrrden Jun 09 '12 at 03:01
  • HTMLNode.m file for(xmlAttrPtr attr = node->properties; NULL != attr; attr = attr->next) – Benjamen Jun 09 '12 at 03:04
  • Your code helped me to solve the problem I had with getting href though. Thank you. – Benjamen Jun 09 '12 at 03:23
  • Perhaps the image is a sibling of the a and not a child. Kinda makes sense...I will edit my answer above. – borrrden Jun 09 '12 at 03:26
  • I used the exactly the same code and it shows me the link. It is not location of the image though. Anyway, I will consider it as answered. If you have any other advice, I will appriciate. Thank you, once again. – Benjamen Jun 09 '12 at 03:35
  • I thought that didn't look right. I wonder why they decided to use `getAttributedNamed()`, with a `d` on the end? How clunky. – Jared Farrish Jun 09 '12 at 03:50
  • I agree, but this is the only htmlParser that I could use. Others are too hard for me to understand. – Benjamen Jun 09 '12 at 03:57