0

I'm trying to write a very simple iOS app that will parse a webpage (http://arxiv.org/list/cond-mat/recent) and display a simplified version of it. I chose to use TFHpple to parse this page. I want to get titles of papers and display them in the TableViewController. The HTML container for paper descriptions looks like:

<div class="list-title">
<span class="descriptor">Title:</span> Encoding Complexity within Supramolecular Analogues of Frustrated  Magnets
</div>

Function that I use to parse and get the values is the following (thanks to raywenderlich.com):

- (void) loadPapers{
    NSURL *papersURL = [NSURL URLWithString:@"http://www.arxiv.org/list/cond-mat/recent"];
    NSData *papersHTMLData = [NSData dataWithContentsOfURL:papersURL];

    TFHpple *papersParser = [TFHpple hppleWithHTMLData:papersHTMLData];

    NSString *papersXpathQueryString = @"//div[@class='list-title']";
    NSArray *papersNodes = [papersParser searchWithXPathQuery:papersXpathQueryString];

    NSMutableArray *newPapers = [[NSMutableArray alloc] initWithCapacity:0];

    for (TFHppleElement *element in papersNodes){
        Paper *paper = [[Paper alloc] init];
        [newPapers addObject:paper];

        paper.title = [[element firstChild] content];
    }

    _objects = newPapers;
    [self.tableView reloadData];

}

This function is supposed to parse the entire HTML page and return data into TableView. However, when I try it returns empty objects into the paperNodes array. Basically, the number of the elements is correct (~25), but they're all empty and I am not sure why.

Any help is greatly appreciated! Thanks!

rmaddy
  • 314,917
  • 42
  • 532
  • 579
Petr Stepanov
  • 71
  • 2
  • 9
  • If you are not bound to tfhpple, you could give [HTMLKit](https://github.com/iabudiab/HTMLKit) a try. Let me know if you need help with that. – iska Jan 14 '16 at 23:30
  • Yeah, I'm not bound to tfhpple, I just need to get an access to the text inside div and then pass this data into some container to display it later. Could you give some useful links that you find good to learn about HTMLKit? – Petr Stepanov Jan 14 '16 at 23:35

2 Answers2

1

I have rewritten your code with HTMLKit. It looks like this:

NSURL *papersURL = [NSURL URLWithString:@"http://www.arxiv.org/list/cond-mat/recent"];
NSData *papersHTMLData = [NSData dataWithContentsOfURL:papersURL];
NSString *htmlString = [[NSString alloc] initWithData:papersHTMLData encoding:NSUTF8StringEncoding];

HTMLDocument *document = [HTMLDocument documentWithString:htmlString];

NSArray *divs = [document querySelectorAll:@"div[class='list-title']"];

for (HTMLElement *element in divs) {
    NSLog(@"%@", element.textContent);
}

Back to your question in the comment:

Could you give some useful links that you find good to learn about HTMLKit?

You can check out the examples on the project's GitHub page. The source code is documented and using it is relatively straightforward. If you have basic HTML & CSS experience then using HTMLKit would be just as easy. Unfortunately there are no other resources it to learn it yet.

iska
  • 2,208
  • 1
  • 18
  • 37
  • Works absolutely great! Thanks for the Kit, Problem solved! – Petr Stepanov Jan 15 '16 at 19:27
  • Hi @iska is the current version HTMLKit functional? I tried to use it but I keep getting error message which says that after I added it to bridge header like `#import "HTMLKit/HTMLKit.h"` it says HTMLElement.h file not found... – ShP Oct 03 '16 at 21:15
  • @ShP Hey there, it is functional, no worries. I guess you are just mixing Objective-C and Swift imports in your project. Head over to the GitHub issue that you've opened to clear things up. – iska Oct 03 '16 at 23:04
0

Probably the [element firstChild] is returning nil. I suggest you add some NSLog statements to track the data extraction and help you pinpoint the error.

Rudi Angela
  • 1,463
  • 1
  • 12
  • 20
  • That's right. Basically, this is my question. I guess that the parser does not parse the html-element correctly. So, I'm not sure what's wrong and what is the correct query. Thank you – Petr Stepanov Jan 10 '16 at 11:11
  • Did you test your XPath expression on the target HTML page, outside of your app? With other words: did you verify that the XPath expression is correct? – Rudi Angela Jan 10 '16 at 13:20
  • Yes, I just did, seems like you're right and I'm getting null objects for [element firstChild] call. But still not sure why. Maybe tag messes up the query that I make to reach the text? – Petr Stepanov Jan 11 '16 at 02:46