1

im pretty new in ios development and im tryin to parse an RSS file(xml).

here is the xml: (sorry for the language)

<item>
<category> General < / category >
<title> killed in a tractor accident , was critically injured windsurfer </ title>
<description>
< ! [ CDATA [
<div> <a href='http://www.ynet.co.il/articles/0,7340,L-4360016,00.html'> <img src = 'http://www.ynet.co. il/PicServer3/2012/11/28/4302844/YOO_8879_a.jpg ' alt =' photo: Yaron Brener 'title =' Amona 'border = '0' width = '116 'height = '116'> </ a> < / div >
] ] >
Tractor driver in his 50s near Kfar Yuval flipped and trapped underneath . Room was critically injured windsurfer hurled rocks because of strong winds and wind surfer after was moderately injured in Netanya
< / description >
<link>
http://www.ynet.co.il/articles/0 , 7340, L- 4360016 , 00.html
< / link >
<pubDate> Fri, 22 Mar 2013 17:10:15 +0200 </ pubDate>
<guid>
http://www.ynet.co.il/articles/0 , 7340, L- 4360016 , 00.html
< / guid >
<tags> Kill , car accidents , surfing < / tags >
< / item >

and here is my xmlparser code:

    - (void)parserDidStartDocument:(NSXMLParser *)parser
    {
       self.titles = [[NSMutableArray alloc]init];
       self.descriptions = [[NSMutableArray alloc]init];
        self.links = [[NSMutableArray alloc]init];
    }

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
    if ([elementName isEqualToString:@"item"]) {
        isItem = YES;
    }

    if ([elementName isEqualToString:@"title"]) {
        isTitle=YES;
        self.titlesString = [[NSMutableString alloc]init];
    }

    if ([elementName isEqualToString:@"description"]) {
        isDesription = YES;
        self.descriptionString = [NSMutableString string];
        self.data = [NSMutableData data];
    }



}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
    if(isItem && isTitle){
        [self.titlesString appendString:string];
    }
    if (isItem && isDesription) {
        if (self.descriptionString)
            [self.descriptionString appendString:string];
    }






}

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
    if (self.data)
        [self.data appendData:CDATABlock];

}


- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementName isEqualToString:@"item"]) {
        isItem = NO;
        [self.titles addObject:self.titlesString];

        [self.descriptions addObject:self.descriptionString];


    }

    if ([elementName isEqualToString:@"title"]) {
        isTitle=NO;

    }
    if ([elementName isEqualToString:@"description"]) {

        NSString *result = [self.descriptionString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
        NSLog(@"string=%@", result);


        if ([self.data length] > 0)
        {
            NSString *htmlSnippet = [[NSString alloc] initWithData:self.data encoding:NSUTF8StringEncoding];
            NSString *imageSrc = [self firstImgUrlString:htmlSnippet];
            NSLog(@"img src=%@", imageSrc);
            [self.links addObject:imageSrc];
        }



        self.descriptionString = nil;
        self.data = nil;
    }


}

- (NSString *)firstImgUrlString:(NSString *)string
{
    NSError *error = NULL;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:&error];

    NSTextCheckingResult *result = [regex firstMatchInString:string
                                                     options:0
                                                       range:NSMakeRange(0, [string length])];

    if (result)
        return [string substringWithRange:[result rangeAtIndex:2]];

    return nil;
}

@end

Like I said I'm pretty new to iPhone development, I looked for ways to solve it for several hours but found nothing. I decided to open a topic, then a few questions:

One. The parser does not ignore what CDATA is just doing parse everything. Why is this happening? As you can see the description itself is not in cdata and I I have only the first step but I get the rest even when I'm not using foundCDATA: (NSData *) CDATABlock

Two. I want to take the image link, how to do it? I searched online and found a lot of guide explains only use the function foundCDATA: (NSData *) CDATABlock But how is it used? The way in which I used in the code?

Please I need an explanation so I can understand, thank you!

Sankumarsingh
  • 9,889
  • 11
  • 50
  • 74
OshriALM
  • 215
  • 3
  • 12

3 Answers3

1

In answer to your two questions:

  1. The parser will, if you have implemented foundCDATA, will parse the description CDATA in that method, and not in foundCharacters. If, on the other hand, you have not implemented foundCDATA, the CDATA will be parsed by foundCharacters. So, if you don't want foundCharacters to parse the CDATA, then you have to implement foundCDATA.

  2. If you want to extract the img URL, you have to parse the HTML you received somehow. You can use Hpple, but I might just be inclined to use a regular expression:

    - (NSString *)firstImgUrlString:(NSString *)string
    {
        NSError *error = NULL;
        NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
                                                                               options:NSRegularExpressionCaseInsensitive
                                                                                 error:&error];
    
        NSTextCheckingResult *result = [regex firstMatchInString:string
                                                         options:0
                                                           range:NSMakeRange(0, [string length])];
    
        if (result)
            return [string substringWithRange:[result rangeAtIndex:2]];
    
        return nil;
    }
    

    Also see this other Stack Overflow answer in which I demonstrate both Hpple and regex solutions:


As an example, here is the NSXMLParserDelegate methods that will parse the description, putting the text (excluding the CDATA) in one field, and putting the image URL from the CDATA in another variable. You'll have to modify to accommodate your process, but hopefully this gives you the basic idea:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
    if ([elementName isEqualToString:@"description"])
    {
        self.string = [NSMutableString string];
        self.data = [NSMutableData data];
    }
}

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError
{
    NSLog(@"%s, parseError=%@", __FUNCTION__, parseError);
}

// In my standard NSXMLParser routine, I leave self.string `nil` when not parsing 
// a particular element, and initialize it if I am parsing. I do it this way
// so only my `didStartElement` and `didEndElement` need to worry about the particulars
// and my `foundCharacters` and `foundCDATA` are simplified. But do it however you
// want.

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if (self.string)
        [self.string appendString:string];
}

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
    if (self.data)
        [self.data appendData:CDATABlock];
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementName isEqualToString:@"description"])
    {
        // get the text (non-CDATA) portion

        // you might want to get rid of the leading and trailing whitespace

        NSString *result = [self.string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
        NSLog(@"string=%@", result);

        // get the img out of the CDATA

        if ([self.data length] > 0)
        {
            NSString *htmlSnippet = [[NSString alloc] initWithData:self.data encoding:NSUTF8StringEncoding];
            NSString *imageSrc = [self firstImgUrlString:htmlSnippet];
            NSLog(@"img src=%@", imageSrc);
        }

        // once I've saved the data where I want to save it, I `nil` out my
        // `string` and `data` properties:

        self.string = nil;
        self.data = nil;
    }
}
Community
  • 1
  • 1
Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • first of all thanks for answer. but why im getin all the text from the "description" even when im not implemented foundCDATA. the parser should skip on CDATA, not? i need only the text, i ll take the image url using Hpple. hope you understand me, thanks again! – OshriALM Mar 23 '13 at 00:14
  • @user1600694 I'm not quite sure what you're trying to achieve, but when you parse XML, the `found...` methods will report all data that is between the open and close `description` tags. The only question is whether you implement both `foundCharacters` and `foundCDATA` (in which case, the two portions of the `description` tag will be returned separately) or just the `foundCharacters` tag (in which case everything between the open and closing tags will be returned by `foundCharacters`). You ask "why am I getting all the text?" That's just how parsers work. – Rob Mar 23 '13 at 00:23
  • If you want to discard the `CDATA` (or perhaps just parse the `src` of the `img`, and discard the rest), then implement `foundCDATA` (in which case `foundCharacters` will exclude the CDATA) and then do whatever you want with the CDATA. – Rob Mar 23 '13 at 00:24
  • @user1600694 I added a sample of what you might want to do if (a) you don't want to return the `CDATA` as part of the `description`, which you achieve by making sure you implement a `foundCDATA` and (b) you want to grab the image URL out of the CDATA. This is just a demonstration of the concept (my example doesn't parse your other elements, doesn't store the results in any model structure, etc.). But it should give you the basic idea. – Rob Mar 23 '13 at 00:56
  • Okay, I get only the text on foundCharacters and the cdata on foundCDATA. from what i read the parser should skip the cdata is actually his function, hide the text inside. But according to what you say (even I tried and it worked), the parser did not ignore it for some reason: | However now the question is how do I get the image link inside the cdata? with hpple? – OshriALM Mar 23 '13 at 00:59
  • @user1600694 I'm not understanding the question. If `foundCharacters` is getting the text and `foundCDATA` is getting the CDATA, doesn't that accomplish what you want? Just then extract the portion from the CDATA that you want. I'm not understanding your question. – Rob Mar 23 '13 at 01:29
  • I just did implement two functions (foundCharacters and foundCDATA) and now the text is in foundCharacters as I collect it. Now I want to get the image link from the cdata caught cna foundCDATA. I do not know how to use regex. If you can explain to me how I take what I got in foundCDATA and analyzes it with the regex I would be grateful. – OshriALM Mar 23 '13 at 01:31
  • @user1600694 Look at the code in my answer. You (a) create a NSString from the `NSMutableData`; and then (b) just call that `firstImgUrlString` method. – Rob Mar 23 '13 at 01:33
  • what do you mean when you say: self.string = [NSMutableString string]; self.data = [NSMutableData data]; why not alloc]init];? what is the different? I get the picture link in nslog but the app crashes: 'NSInvalidArgumentException', reason: '*** - [__NSArrayM insertObject: atIndex:]: object cannot be nil' i added my code. Sorry if I'm driving you crazy but I have to figure it out, thanks for understanding .. – OshriALM Mar 23 '13 at 02:11
  • We should move this to chat, but can't until you have 20 rep, so I'll continue to add comments. Your error is that you're trying to insert a `nil` into your array. Maybe one of your entries doesn't have an `img`? Or maybe you've made some other mistake. Hard to say without seeing latest code. In terms of `alloc`/`init` vs `string`/`data` there's a minor technical difference (non-autoreleased object v autoreleased object), but both work if you're using ARC and accessor methods. – Rob Mar 23 '13 at 04:48
0

Answer 1: I will go along with the answer given by Rob for this question.

Answer 2: Just try this to get the image link....

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{   
    if([currentElement isEqualToString:@"img"]) {
        NSLog(@"%@",[attributeDict objectForKey:@"src"]);
    }
}
NULL
  • 1,848
  • 1
  • 21
  • 23
  • On point 2, you'd be absolutely correct if that img tag was an XML tag. Sadly, it's inside the CDATA, which NSXMLParser won't parse (nor is it supposed to). That's the entire purpose of CDATA, to flag content as not to be parsed as XML. – Rob Mar 22 '13 at 23:57
0

The image link your are looking to extract is inside a CDATA block, but rss parser ignores CDATA block.

If you need to extract the string from CDATA, you could use this block in foundCDATA:

    - (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
    {

    NSMutableString *cdstring = [[NSMutableString alloc] initWithData:CDATABlock encoding:NSUTF8StringEncoding];
    }

now the mutablestring "cdstring" will be containing:

    <div>
    <a href='http://www.ynet.co.il/articles/0,7340,L-4360016,00.html'>
    <img src='http://www.ynet.co. il/PicServer3/2012/11/28/4302844/YOO_8879_a.jpg ' alt=' photo: Yaron Brener ' title=' Amona ' border='0' width='116 ' height='116'>
    </ a>
    </ div>
    ]]>

now just search for href=' using stringcontainsstring and extract the link or use a webview to just display

 htmlContent = [NSString stringWithFormat:@"%@", cdstring];
    [webView loadHTMLString:htmlContent baseURL:nil];
Sankumarsingh
  • 9,889
  • 11
  • 50
  • 74