0

i want to parse html content into Dictionary

EDIT:

I need to parse just simple HTML, don't need to consider the complex situation.
WEB side: when I was in the system input information, using the HTML editor. But fix the old WEB system , need to modify the place more, so temporary use parsing HTML mode in the current version of the APP。

END:

Html just like this:

<p>hahaha</p><img src="aaaa.jpg"/>heihei<img src="bbb.jpg"/>guagua

i want the result is:

text hahaha
img  aaaa.jpg
text heihei
img  bbb.jpg
text guagua

my code is:

   //<p>hahaha</p><img src="aaaa.jpg"/>heihei<img src="bbb.jpg"/>guagua
   //for this
  //NSArray = {0,1,2,3,4}
  //NSDictionary{Sort-Key,V}={{0,{text,hahaha}},{1,{img,aaaa.jpg}},{2,{text,heihei}},{3,     {img,bbb.jpg}},{4,{text,guagua}}}

 -(NSArray*)RegularExpression:(NSString *)str dic:(NSMutableDictionary**)dic
{
     if(str == nil) return nil;
     NSString *pgnText = str;
     NSString* tags=@"<[p|div].*?>(.*?)</[p|div].*?>";
     NSString *regTags = tags;
     NSError *error;
     NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regTags                                                                         options:NSRegularExpressionCaseInsensitive error:&error];
     NSArray *matches = [regex matchesInString:pgnText
                                  options:0
                                    range:NSMakeRange(0, [pgnText length])];

     NSMutableArray* arrItems = [[NSMutableArray alloc] initWithCapacity:[matches count]];
     if(matches.count >0){
         for (NSTextCheckingResult *match in matches) {
             NSString *tagValue = [pgnText substringWithRange:[match rangeAtIndex:1]];
             NSArray* arr = [self RegularExpression:tagValue dic:dic];
             [arrItems addObjectsFromArray:arr];
        }
    }
    else{
        NSString* regTags2 = @".*?<img.*?src.*?=.*?[\"|”](.*?)[\"|”].*?/>";
        NSRegularExpression *regex2 = [NSRegularExpression     regularExpressionWithPattern:regTags2            options:NSRegularExpressionCaseInsensitive|NSRegularExpressionAnchorsMatchLines                                                            error:&error];
        pgnText = str;
        NSArray *matches2 = [regex2 matchesInString:pgnText
                             options:0
                               range:NSMakeRange(0, [pgnText length])];
        for (NSTextCheckingResult *match in matches2) {
            NSString *tagValue = [pgnText substringWithRange:[match rangeAtIndex:1]];
            NSLog(@"%@",tagValue);
        }
    }
    return [arrItems autorelease];
}

Who has done similar function?

zt9788
  • 948
  • 4
  • 16
  • 31
  • Nobody who is sane will try to use regular expressions on full-blown HTML. Why don't you use an actual XML / HTML parser? That will be much easier. – borrrden Sep 17 '13 at 04:08
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Pier-Luc Gendreau Sep 17 '13 at 04:17
  • @borrrden,thank you !sorry ,My network has a problem today~~i had just see your comment ,and i had accpet the answer – zt9788 Sep 17 '13 at 13:19
  • @Pier-LucGendreau Thank you too,i had accpet the answer – zt9788 Sep 17 '13 at 13:20

1 Answers1

0

Keys in a dictionary must be unique. You cannot have more than one "img" key.

Check out this SO question: Objective-C DOM XML parser for iPhone

Community
  • 1
  • 1
KevinS
  • 598
  • 4
  • 18