0

I'm trying to design an android or windows phone 7 app (a client) that would fetch the news from a website, that website doesn't provide an api or xml files.

My question is, whats the best way to do that? Should I just download the html file and parse its content? Am sorry if my questions is a little vague, but am not asking for code. I need some guidelines or approaches to do that.

Note: I wont violate any copyrights, am just creating a portal for my University website.

Ateik
  • 2,458
  • 4
  • 39
  • 59
  • Tim's answer is most likely to be what you want if your University's website doesn't provide any 'formal' feed (such as RSS, for example). Be aware that it's basically 'web-scraping' which means reverse-engineering the HTML from the pages. This is fine (I do it myself) but is always a risk if the page format(s) get changed at some time in the future. Good luck. – Squonk Sep 01 '11 at 23:53
  • android or WP7? It'll make a big difference to how you do this based on the platform you're building for. – Matt Lacey Sep 02 '11 at 11:36
  • @mistersquonk, thanks, your comment is helpful – Ateik Sep 02 '11 at 14:46
  • @Matt , if you read my question, you would know am not asking about code. am just asking about the concept. anyway thanks – Ateik Sep 02 '11 at 14:46
  • see http://stackoverflow.com/faq before asking about concepts. – Matt Lacey Sep 02 '11 at 15:29
  • oh, i see now. so should i delete this question now? – Ateik Sep 02 '11 at 16:34

2 Answers2

2

If Windows 7, there is a version of the HtmlAgility Pack for WP7.

Here is a bit of sample code:

public void Hap()
{
    HtmlWeb.LoadAsync("http://www.mycollege.edu/news", OnCallback);           
}

private void OnCallback(object s, HtmlDocumentLoadCompleted htmlDocumentLoadCompleted)
{    
    var htmlDocument = htmlDocumentLoadCompleted.Document;
    //use agilitypack to parse out news    
}

Another approach is to have a service of some sort actually do the scraping and management of the news data then you control the format that the mobile devices consume like XML or JSON.

Derek Beattie
  • 9,429
  • 4
  • 30
  • 44
  • yea, i think theres no other way but to parse the html, thanks! I liked the service idea, so i can control the format in case the website changes – Ateik Sep 02 '11 at 14:49
  • For .net the agility pack seems to be the best solution I've found. – Derek Beattie Sep 02 '11 at 16:48
0

Check out this question for some clues on html parsing. Parse HTML in Android

spoiler here is a link that has some java html parsing things that you can try: http://java-source.net/open-source/html-parsers

Depending on the html that you are trying to parse you may have better or worse luck actually getting the content you want to out of it.

Community
  • 1
  • 1
FoamyGuy
  • 46,603
  • 18
  • 125
  • 156