6

May I know what is the regular expression rule for matching html tag

<span id="id1" class="class1" attribute="attribute1">Test</span>

that matches the tag SPAN regardless the property in it.

My existing rule is this, but not working.

/(<span [^>]*>)>/s
halfer
  • 19,824
  • 17
  • 99
  • 186
davidlee
  • 5,611
  • 17
  • 56
  • 82

3 Answers3

10

David, the reason (<span [^>]*>)> is that you have a small typo.

You see, that expression tries to match two closing >: look closely at the end >)>. For instance, it would match <span hey there>> but not <span hey there>

To match the opening span, make sure you only have one >.

With all the disclaimers about using regex to match html, this regex will do:

<span[^>]*>

If you sometimes expect SPAN, make sure to make it case-insensitive.

Only if you have time: an additional flourish

In a comment, @DavidEhrmann points out that the regex above will match <spanner>. If you want to make him happy and ensure that if the span is more than just <span> it always contains a space after span, you can use:

<span(?: [^>]*)?>

However, in my view, that is an unnecessary flourish. When we parse html with regex, we always know that we are using a crude tool, and we rely on the input to be fairly well-formed. For instance, with the revised regex above, there are still a million ways that we can match improper html, for instance: <span classification>

What to do? Nothing. Know your tools, know what they can do, know the risks, and decide when the situation warrants regex and when it warrants a DOM parser.

zx81
  • 41,100
  • 9
  • 89
  • 105
  • That also matches (not that it's a real tag). – David Ehrmann Jun 05 '14 at 06:52
  • @DavidEhrmann Sure, I know that. Within the context of using regex to match html, which davidlee is trying to do, I am trying to help fix his expression. IMO trying to make a regex for a `` more specific would be error-fraught (what do we want to allow: classes? ids? styles?) – zx81 Jun 05 '14 at 06:56
  • @DavidEhrmann To let you rest at ease, added an option to avoid ``, should davidlee also care. :) Cheers. – zx81 Jun 05 '14 at 07:06
  • Obligatory link to http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Aron Jun 27 '14 at 21:57
7

Take a look at here Regex for Html Tags

Try This

<span[^>]*>[\s\S]+<\/span>

Regex Demo


O/ P:

enter image description here

Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
  • 1
    Does not work http://rubular.com/r/F0WdIAOpC1 I would -1...except that is an awesome website...thanks. – Aron Jun 05 '14 at 06:40
3
/<span[^>]*>[^>]*<\/span>/
Fabricator
  • 12,722
  • 2
  • 27
  • 40