11

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I have a HTML page with

<a class="development" href="[variable content]">X</a>

The [variable content] is different in each place, the rest is the same.
What regexp will catch all of those links? (Although I am not writing it here, I did try...)

Community
  • 1
  • 1
Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278

5 Answers5

5

Try this regular expression:

<a class="development" href="[^"]*">X</a>
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • single-quoted attributes are also valid html. and, depending on the source, you can even have invalid html, by which point you're screwed. – kch May 04 '09 at 20:02
5

What about the non-greedy version:

<a class="development" href="(.*?)">X</a>
vrish88
  • 20,047
  • 8
  • 38
  • 56
  • You're doing a capture that likely won't be used. Other than that, I dont't see much difference in using this or Gumbo's version. – kch May 04 '09 at 20:08
4

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Community
  • 1
  • 1
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
1

Regex is generally a bad solution for HTML parsing, a topic which gets discussed every time a question like this is asked. For example, the element could wrap onto another line, either as

<a class="development" 
  href="[variable content]">X</a>

or

<a class="development" href="[variable content]">X
</a>

What are you trying to achieve?

Using JQuery you could disable the links with:

$("a.development").onclick = function() { return false; }

or

$("a.development").attr("href", "#");
OtherDevOpsGene
  • 7,302
  • 2
  • 31
  • 46
  • this solution would assume that Itay Moav is using the jquery library and that it's a client side parsing that he wishes to acheive – vrish88 May 04 '09 at 17:19
  • @vrish88: Correct. Thus the question "What are you trying to achieve?" and the comment "Using JQuery you could..." – OtherDevOpsGene May 04 '09 at 18:24
1

Here's a version that'll allow all sorts of evil to be put in the href attribute.

/<a class="development" href=(?:"[^"]*"|'[^']*'|[^\s<>]+)>.*?<\/a>/m

I'm also assuming X is going to be variable, so I added a non-greedy match there to handle it, and the /m means . matches line-breaks too.

kch
  • 77,385
  • 46
  • 136
  • 148