I have an html string that I get from the response of a website. Everything I do there works awesome and I have no difficulty. What I need to go is grab the only href
attribute within the html. What is the best approach for getting this URL that is contained within that attribute. I am open to any external libraries if that is necessary, I just want the most efficient way possible. Thanks.
Asked
Active
Viewed 1,771 times
2
2 Answers
4
Use this API to parse the HTML code and pick the elements you want.
ElementParser is lightweight framework to provide easy access to xml and html content. Rather than get lost in the complexities of the HTML and XML specifications, it aspires to not obscure their essential simplicity. It doesn’t do everything, it aspires to do “just enough”.
Source: http://touchtank.wordpress.com/element-parser/
Here is an example of how to use the ElementParser
with your own example. I hope this is helpful.
Merry Xmas! Ho-Ho-Ho
// Here you create the parser, don't forget to #import "Element.h" and #import "ElementParser.h"
ElementParser * parser = [[ElementParser alloc] init];
// This is the HTML source code that you want to parse
DocumentRoot* document = [parser parseHTML:@"<html><a href=\"http://google.com\">Google Link</a></html>"];
// Create an array where you will put all the <a></a> elements
NSArray* elements = [document selectElements: @"a"];
// Iterate though the array, for each element pick the "href" attribute
NSMutableArray* results = [NSMutableArray array];
for (Element* element in elements){
NSString* snipet = [element attribute:@"href"];
// Add the result for each element to the "results" array
[results addObject: snipet];
}
// Print the results on the screen
NSLog(@"%@",[results componentsJoinedByString: @"\n"]);

dimme
- 4,393
- 4
- 31
- 51
-
I actually have that framework in my project right now! I can't find out how to use it at all! There is no documentation on it, just a short paragraph on how to use it. It doesn't go into much detail at all. Does anybody understand how to use ElementParser? If so, could you show me how I could extract this `href` attribute using it. Thanks. – Eli Dec 24 '11 at 14:59
-
I will give it try myself and come back with more details. – dimme Dec 24 '11 at 18:24
-
If you want a specific a-element, look though the properties of `element`. You can filter them out depending on their content. – dimme Dec 24 '11 at 19:25
-1
You could use NSRegularExpresion for extracting the url of the html tag.
NSString *regexStr = @"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?";
NSString * url = @"<a href=\"http://www.stackoverflow.org/\">stackoverflow</a>";
NSError *error;
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:regexStr options:0 error:&error];
if( testRegex == nil ) NSLog( @"Error making regex: %@", error );
NSRange range = [testRegex rangeOfFirstMatchInString:url options:0 range:NSMakeRange(0, [url length])];
NSString * href = [url substringWithRange:range];
Bear in mind that NSRegularExpression needs IOS 4 or 5.

Ecarrion
- 4,940
- 1
- 33
- 44
-
-1 you can't parse HTML with regex. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Steve Dec 24 '11 at 08:04
-
If you take a look at the answer I'm not parsing HTML, as long as the function concern, it's just matching an url from a ramdom string... – Ecarrion Dec 24 '11 at 14:12
-
Which bolsters what im saying. Text in the body of the document which happened to contain a similar URL would match. As would text from a tag the OP is not interested in, etc etc. If you keep trying to do battle with all these possibilities using a regex, you'll get to a point where not only does it become extremely complex and hard to maintain, but you may also find yourself in a position where what you want to do just can't be done with a regex. – Steve Dec 24 '11 at 15:48
-
Also this url changed every time because it included a 'session id' within it. Thanks. – Eli Dec 24 '11 at 16:08
-
You're right, reading the question again I don't know why I assume that the problem was to find the url in a simple tag like the one that holds the url ivar, if the html contains big amount of data is probable that the function doesn't behave correctly. Thanks! – Ecarrion Dec 24 '11 at 16:08
-
Would you have any other suggestions? There is not much html in this string, and there is only one `href` attribute in the whole string as well. Thanks. – Eli Dec 24 '11 at 16:51
-
If you guarantee that there is only one href on your html, you can modify the regexStr ivar to this: NSString *regexStr = @"href=\"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?"; And then add this two lines at the end: NSArray * components = [href componentsSeparatedByString:@"=\""]; href = [components objectAtIndex:1]; – Ecarrion Dec 24 '11 at 16:59