0

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I would like to ensure that HTML attributes have quotes around them as is required by xhtml.

For example:

<BODY link=#0000ff vLink=#800080>

should be

<BODY link="#0000ff" vLink="#800080">

I am looking for a Regex pattern that would handle this.

Thanks

Community
  • 1
  • 1
Peter
  • 1
  • Parsing Html The Cthulhu Way, http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html – Rubens Farias Sep 08 '10 at 01:57
  • 2
    This site has over 100 questions about parsing HTML with regular expressions, all of which have the same answer: don't even bother trying; it doesn't work, and no matter how clever you get with your REs, it still won't work. – Jerry Coffin Sep 08 '10 at 02:14
  • The `body` element is all lower case in the XHTML schema: http://www.w3.org/TR/xhtml1-schema/ XHTML is XML; if it fails validation, it's junk. – McDowell Sep 08 '10 at 20:46

1 Answers1

2

Whilst not an exact duplicate, the basic answer is the same.

What you want is not regex, but a DOM parser.

Please specify your server side language. Or do you intend to do this with JavaScript? If so, there is not much point.

A suggestion too, if you are doing that to make it valid XHTML, then you should probably know the body element (and all the elements and attributes) are used in lowercase.

alex
  • 479,566
  • 201
  • 878
  • 984