A php regex to remove white spaces in html

Question

Hi I have a html like

<html>
   <head>
     <title>
          Some title
   </title>
</head>
<body>
    <div id="one">         some sample info </div>
</body>
</html>

How can I remove white spaces in this html except those in contents and within the tags using some regex using preg_replace? so to get something like this

<html><head><title>Some title</title></head><body><div id="one">some sample info</div></body></html>

please can anyone help me with this?? :)

what if there is `
` elements?
– Gordon Feb 01 '12 at 12:09 — Gordon, Feb 01 '12 at 12:09

Sufian Latif · Accepted Answer · 2012-05-16T06:11:18.757

5

You can replace (?<=>)\s+(?=<)|(?<=>)\s+(?!=<)|(?!<=>)\s+(?=<) with empty strings.

Edit: There's a simpler form: replace (?<=>)\s+|\s+(?=<)

Simply spoken, this regex will replace a group of one or more whitespaces if it has a > to the left or a < to the right.

It actually has two parts joined by OR (symbol: |), so either one may match:

(?<=>)\s+ - this will match one or more whitespaces (\s+ in the regex), if it is preceded by a < (in regex: (?<=>)).
\s+(?!=<) - this will match one or more whitespaces if it is followed by a < (in regex: (?!=<))

Learn more about regex.

edited May 16 '12 at 06:11

answered Feb 01 '12 at 12:10

Sufian Latif

13,086
3
33
70

1

This answer is completely unstable and relies on the notion that there are no lingering `>` or `<` symbols in any of the textnodes in the html document. I would not recommend this technique to anyone. This is just another case where using regex to do a DOM parser's job is inappropriate. Researchers, please be informed that regex is "DOM-ignorant" -- it doesn't know if it is matching the start/end of a tag or merely something that resembles the start/end of a tag. At the very least, this regex is too primitive to do a consistently good job. – mickmackusa Nov 04 '21 at 09:17

A php regex to remove white spaces in html

1 Answers1

Linked