0

Given a URL of a webpage, I need to get the HTML between an opening <div> and a closing </div> of a particular class.

I think if I can return the whole html code of the page as a string I could use RegEx to extract between the HTML between the certain <div> class and return it as a string.

How could we achieve this using Objective-C and RegExes?

Yatharth Agarwal
  • 4,385
  • 2
  • 24
  • 53
Lewis
  • 159
  • 10

2 Answers2

1

For the parsing part, I have 3 words for you:

Don't try it

Read Parsing HTML the Cthulhu Wya (by Jeff himself) and see this ever-famous SO answer. For libraries, use HTML::Sanitizer

On the other hand, most programs will neither need to, nor should, anticipate the entire universe of HTML when parsing. In fact, designing a program to do so may well be a completely wrong-headed approach, if it changes a program from a few-line script to a bullet-proof commercial-grade program which takes orders of magnitude more time to properly code and support. Resource expenditure should always (oops, make that very frequently, I about overgeneralized, too) be considered when creating a programmatic solution. In addition, hard boundaries need not always be an HTML-oriented limitation. They can be as simple as "work with these sets of web pages", "work with this data from these web pages", "work for 98% users 98% of the time", or even "OMG, we have to make this work in the next hour, do the best you can".

So if you're parsing something like icanhazip, you can opt for it. Maybe if it's small, it might work. Or if you're using static content. That's for you to choose. Good luck!

Community
  • 1
  • 1
Yatharth Agarwal
  • 4,385
  • 2
  • 24
  • 53
0

you can check if string match a regex with NSPredicate

This code will check if _test is an email address

-(BOOL)CheckInput:(NSString *)_text  
{  
    NSString *Regex = @"[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}";  
    NSPredicate *emailTest = [NSPredicatepredicateWithFormat:@"SELF MATCHES %@", Regex];   
    return [emailTest evaluateWithObject:_text];  
}  
OpenThread
  • 2,096
  • 3
  • 28
  • 49