1

i need a regular expressions string to get all anchor tags in a page with a specific css class name, in c#/vb.net

this is what i got so far

"<a.*?href=""(.*?)"".*?>(.*?)</a>"

but my attempts to add "class=name" isnt working, also is it possible to find links where the class name appears either before or after the href with one expression ?

i am familiar with 3rd party html libraries, but thats an overkill for what i have in mind, so is the webbrowser control.

ambiguousPanda
  • 109
  • 1
  • 1
  • 9
  • 1
    I feel obligated to include [this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) answer. Although your goal is more specific, you will probably get more accurate results with a proper html parser. – Roman Jan 15 '11 at 12:52

4 Answers4

0

I'd do that in two steps:

  1. find all anchor tags with a regular expression
  2. filter out all those that have the incorrect class name.
Martin v. Löwis
  • 124,830
  • 17
  • 198
  • 235
0

better not try to parse html with regexes, but use a XML library and use xpath expressions

Robokop
  • 906
  • 1
  • 5
  • 12
0
<a href="(.*?)" class="(.*?)">(.*?)</a>

If you take the second group, that should return the class name. I'm presuming that's what you're after.

Edit: Re-read the question... If you're after a specific class name, substitute the second (*.?) with what you want. E.g. if you're after class temp, do:

<a href="(.*?)" class="temp">(.*?)</a>

Then take the first group for the link, or the second group for the link text.

If you're using it in C#, you will need to escape the quotes. The following should work in C#.

string regex = @"<a href=""(.*?)"" class=""temp"">(.*?)</a>";
joshhendo
  • 1,964
  • 1
  • 21
  • 28
0

Try this:

(?<1><a *?)(?<2>[^>]*?class=")(?<3>test)(?<4>"[^>]*?>)

And do a Replace with

$1$2MyClass$4

Works for stuff like;

<a class="test" href="http://www.google.com">Test</a>

Edit: extracting url

If you want to extract the URL for a certain class you'll need to use 2 expressions:

(?<1><a *?)(?<2>[^>]*?class="test"[^>]*? href=")(?<3>[^"]*?)(?<4>"[^>]*?>)

(?<1><a *?)(?<2>[^>]*?href=")(?<3>[^"]*?)(?<4>"[^>]*?class="test"[^>]*?>)

The url is located in group 3.

First one will match:

<a class="test" href="http://www.google.com">Test</a>

The second one will match

<a href="http://www.google.com" class="test">Test</a>
Kees C. Bakker
  • 32,294
  • 27
  • 115
  • 203