How to implement this method in my NSXMLParser to extract images

Question

i'm new in iOS development, and at this moment i have implemented the NSXMLparser , but i really don't know how to separate tags with the same name, but different content, like the <description>. In some feeds, this tag has only the summary and in other, contains the " img src ", which i want to extract too. (with or without CDATA)

Example of description tags wich i need to grab the images and then pass to my UIImageView:

<description><![CDATA[ <p>Roger Craig Smith and Troy Baker to play Batman and the Joker respectively in upcoming action game; Deathstroke confirmed as playable character. </p><p><img src="http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg"

<description>&lt;img src=&quot;http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg&quot; width=&quot;70&quot; height=&quot;92&quot; hspace=&quot;3&quot; alt=&quot;&quot; border=&quot;0&quot; align=left style="background:#333333;padding:0px;margin:0px 4px 0px 0px;border-style:solid;border-color:#aaaaaa;border-width:1px" /&gt; &lt;p&gt;

I think that @Rob example solves my case but i don't know how to include in my NSXMLParser, described below, to separate data and images. I'm able to grab only the data (summary) on this parser.

My NSXMLParser:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
element = [elementName copy];


if ([elementName isEqualToString:@"item"])
{
    elements = [[NSMutableDictionary alloc] init];
    title = [[NSMutableString alloc] init];
    date = [[NSMutableString alloc] init];
    summary = [[NSMutableString alloc] init];
    link = [[NSMutableString alloc] init];
    img = [[NSMutableString alloc] init];
    imageLink = [[NSMutableString alloc]init];

}

if([elementName isEqualToString:@"media:thumbnail"]) {
    NSLog(@"thumbnails media:thumbnail: %@", attributeDict);
    imageLink = [attributeDict objectForKey:@"url"];
}

if([elementName isEqualToString:@"media:content"]) {
    NSLog(@"thumbnails media:content: %@", attributeDict);
    imageLink = [attributeDict objectForKey:@"url"];

}

if([elementName isEqualToString:@"enclosure"]) {
    NSLog(@"thumbnails Enclosure %@", attributeDict);
    imageLink = [attributeDict objectForKey:@"url"];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
if ([element isEqualToString:@"title"])
{
    [title appendString:string];
}
else if ([element isEqualToString:@"pubDate"])
{
    [date appendString:string];
}
else if ([element isEqualToString:@"description"])
{
    [summary appendString:string];

}
   else if ([element isEqualToString:@"media:description"])
{
    [summary appendString:string];

}
else if ([element isEqualToString:@"link"])
{
    [link appendString:string];
}
else if ([element isEqualToString:@"url"]) {

    [imageLink appendString:string];
}
else if ([element isEqualToString:@"src"]) {

    [imageLink appendString:string];
}
else if ([element isEqualToString:@"content:encoded"]){
    NSString *imgString = [self getImage:string];
    if (imgString != nil) {
        [img appendString:imgString];
        NSLog(@"Content of img:%@", img);
    }

}

-(NSString *) getImage:(NSString *)htmlString {
NSString *url = nil;

NSScanner *theScanner = [NSScanner scannerWithString:htmlString];

[theScanner scanUpToString:@"<img" intoString:nil];
if (![theScanner isAtEnd]) {
    [theScanner scanUpToString:@"src" intoString:nil];
    NSCharacterSet *charset = [NSCharacterSet characterSetWithCharactersInString:@"\"'"];
    [theScanner scanUpToCharactersFromSet:charset intoString:nil];
    [theScanner scanCharactersFromSet:charset intoString:nil];
    [theScanner scanUpToCharactersFromSet:charset intoString:&url];

}
return url;
}

@end

Rob · Accepted Answer · 2013-05-21T03:12:20.850

In your example you just have two description elements, each which has the img tag embedded within it. You just parse the description like normal, and then pull out the img tags (using regular expressions, using my retrieveImageSourceTagsViaRegex below, or a scanner).

Note, you do not have to handle the CDATA and non-CDATA renditions differently if you don't want. While NSXMLParserDelegate provides a foundCDATA routine, I'd actually be inclined to not implement that. In the absence of a foundCDATA, the standard foundCharacters routine of NSXMLParser will gracefully handle both renditions of your description tag (with and without CDATA) seamlessly.

Consider the following hypothetical XML:

<xml>
    <descriptions>
        <description><![CDATA[ <p>Roger Craig Smith and Troy Baker to play Batman and the Joker respectively in upcoming action game; Deathstroke confirmed as playable character. </p><p><img src="http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg">]]></description>
        <description>&lt;img src=&quot;http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg&quot; width=&quot;70&quot; height=&quot;92&quot; hspace=&quot;3&quot; alt=&quot;&quot; border=&quot;0&quot; align=left style="background:#333333;padding:0px;margin:0px 4px 0px 0px;border-style:solid;border-color:#aaaaaa;border-width:1px" /&gt; &lt;p&gt;</description>
    </descriptions>
</xml>

The following parser will parse both of those description entries, grabbing the image URLs out of them. And as you'll see, there is no special handling for CDATA needed:

@interface ViewController () <NSXMLParserDelegate>

@property (nonatomic, strong) NSMutableString *description;
@property (nonatomic, strong) NSMutableArray *results;

@end

@implementation ViewController

- (void)viewDidLoad
{
    [super viewDidLoad];
    // Do any additional setup after loading the view, typically from a nib.

    NSURL *filename = [[NSBundle mainBundle] URLForResource:@"test" withExtension:@"xml"];
    NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:filename];
    parser.delegate = self;
    [parser parse];

    // full array of dictionary entries

    NSLog(@"results = %@", self.results);
}

- (NSMutableArray *)retrieveImageSourceTagsViaRegex:(NSString *)string
{
    NSError *error = NULL;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:&error];

    NSMutableArray *results = [NSMutableArray array];

    [regex enumerateMatchesInString:string
                            options:0
                              range:NSMakeRange(0, [string length])
                         usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {

                             [results addObject:[string substringWithRange:[result rangeAtIndex:2]]];
                         }];

    return results;
}

#pragma mark - NSXMLParserDelegate

- (void)parserDidStartDocument:(NSXMLParser *)parser
{
    self.results = [NSMutableArray array];
}

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
    if ([elementName isEqualToString:@"description"])
        self.description = [NSMutableString string];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if (self.description)
        [self.description appendString:string];
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementName isEqualToString:@"description"])
    {
        NSArray *imgTags = [self retrieveImageSourceTagsViaRegex:self.description];
        NSDictionary *result = @{@"description": self.description, @"imgs" : imgTags};
        [self.results addObject:result];
        self.description = nil;
    }
}

@end

That yields the following results (note, no CDATA):

results = (
        {
        description = " <p>Roger Craig Smith and Troy Baker to play Batman and the Joker respectively in upcoming action game; Deathstroke confirmed as playable character. </p><p><img src=\"http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg\">";
        imgs =         (
            "http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg"
        );
    },
        {
        description = "<img src=\"http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg\" width=\"70\" height=\"92\" hspace=\"3\" alt=\"\" border=\"0\" align=left style=\"background:#333333;padding:0px;margin:0px 4px 0px 0px;border-style:solid;border-color:#aaaaaa;border-width:1px\" /> <p>";
        imgs =         (
            "http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg"
        );
    }
)

So, bottom line, just parse the XML like normal, don't worry about CDATA, and just parse out the image URL using a NSScanner or NSRegularExpression as you see fit.

I'm sorry for not being clear enough, what I meant is that in some XML files, the description tag has images inside the CDATA and other not. My example of description tags above, are from different RSS Feeds, not one XML file with two description tags inside. when i implement the foundCDATA method in my NSXMLParser, apparently it overwrites my summary, and grab the "img src" images, but I need both. Please, see my parser here [link](https://dl.dropboxusercontent.com/u/1216970/RSSParser.rtf) Thanks, I really appreciate your help. — Edward, May 21 '13 at 01:37
@Edward You don't have to implement `foundCDATA` at all. If you don't, the standard `foundCharacters` will parse it for you automatically, correctly extracting the characters from the `CDATA` for you (but eliminating the `CDATA` opening and closing tags). Especially if you have a mix of sometimes `CDATA` and sometimes not, just don't implement `foundCDATA` and `foundCharacters` will handle both quite gracefully. See my implementation; single XML file, one `description` tag has a `CDATA`, the other doesn't, but the standard `foundCharacter` parsed both perfectly. — Rob, May 21 '13 at 01:56
Let's move this to chat: http://chat.stackoverflow.com/rooms/30287/chat-with-edward — Rob, May 21 '13 at 01:59

How to implement this method in my NSXMLParser to extract images

1 Answers1