0

I have been programming all day trying to accomplish my goal. At first I tried using Regular Expressions (Regex) but it seamed much too complicated and inneficient although it did achieve my goal somewhat.

This is the link to the site I'm working with:

http://thewarezscene.org/forums/memberlist.php?start=20    

If you view the page's source (The site seems to be down at the moment) you will notice this recuring link tag:

<a href="http://thewarezscene.org/forums/username-u14088.html">USERNAME</a>

Each new page has a list of everyone registered to the site. Incrementing by 20. Ex. start=20, start=40, start=60. I know how to get all elements from an HTML page, but what would be the best solution to get the link text for that specific link format only?

43.52.4D.
  • 950
  • 6
  • 14
  • 28

2 Answers2

0

Use an HTML parser like the HTML Agility Pack to parse the HTML.

What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Regex is not well suited for parsing HTML as demonstrated in this answer.

Community
  • 1
  • 1
Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • Oded How to I get HTML Agility Pack? Is it already with the .NET framework? Or is it a libray I have to download from somewhere? – 43.52.4D. Aug 09 '12 at 18:25
  • @43.52.4D. - I did provide a link. It does have a download. And even if I didn't you could google for it. Make a bit of an effort please. – Oded Aug 09 '12 at 18:27
  • I did Google it I just wanted to make sure. And I'm 14 learning programming by myself, that needs effort. – 43.52.4D. Aug 09 '12 at 18:54
  • @43.52.4D. - Good on you. But asking for a download when a link was provided is not showing effort... – Oded Aug 09 '12 at 18:56
0

If you want to get all elements where the parameter "start" exists in href

$("a[href*='start=']")
Diego ZoracKy
  • 2,227
  • 15
  • 14