-1

I have this regular expression:

$buffer = preg_replace("/'([a-zA-Z0-9]+)'/iU",'$1',$buffer);

It removes single quotes when there's no whitespace between the quotes. I also replaces inside a html tag. I don't want it to do so.

Here's an example

<div id="Foo"></div>

Should be:

<div id=Foo></div>

And

<script>Foo='Bar'</script>

Should not change and therefore be:

<script>Foo='Bar'</script>
Fredefl
  • 1,391
  • 2
  • 17
  • 33

1 Answers1

5

HTML is unpredictable, and cannot be accurately handled with regular expressions. Unless you created the HTML and can be very, very certain of its exact format, use an HTML parser. Even if you can, the HTML parser is probably far easier to use, anyway.

Sorry :/

Community
  • 1
  • 1
Matchu
  • 83,922
  • 18
  • 153
  • 160
  • 1
    *Always* use a HTML parser. HTML is not a regular language, and there's absolutely no reason to use regular expressions on it, especially not when HTML parsers are available. – You Jul 02 '11 at 18:05
  • Maybe we could turn it around and only replace stuff thats 'inside' a tag: Fx – Fredefl Jul 02 '11 at 18:07
  • @Fredefl: the trouble is that it's very difficult to know whether or not you're in a tag. Suppose I gave you ``, which you could easily find out on the wild. It's not worth trying to use a regex when a test case like that could easily show up. – Matchu Jul 02 '11 at 18:09
  • HTML parsers can do that too, and catch scary exceptions you'll need 10+ characters of regex for. – NorthGuard Jul 02 '11 at 18:10
  • Can you give me a simple expression that does this? It can be filled with errors..... – Fredefl Jul 02 '11 at 18:45
  • @Fredefl: I wouldn't dare try. – Matchu Jul 02 '11 at 19:10