36

I have HTML code like:

<div class="wrap">
    <div>
        <div id="hmenus">
            <div class="nav mainnavs">
                <ul>
                    <li><a id="nav-questions" href="/questions">Questions</a></li>
                    <li><a id="nav-tags" href="/tags">Tags</a></li>
                    <li><a id="nav-users" href="/users">Users</a></li>
                    <li><a id="nav-badges" href="/badges">Badges</a></li>
                    <li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li>
                </ul>
            </div>
        </div>
    </div>
</div>

How do I remove whitespace between tags by PHP?

We should get:

<div class="wrap"><div><div id="hmenus"><div class="nav mainnavs"><ul><li><a id="nav-questions" href="/questions">Questions</a></li><li><a id="nav-tags" href="/tags">Tags</a></li><li><a id="nav-users" href="/users">Users</a></li><li><a id="nav-badges" href="/badges">Badges</a></li><li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li></ul></div></div></div></div>
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
James
  • 42,081
  • 53
  • 136
  • 161
  • possible duplicate of [Remove all the line breaks from the html source](http://stackoverflow.com/questions/5258543/remove-all-the-line-breaks-from-the-html-source) – Gordon Mar 19 '11 at 13:00
  • 3
    I needed this - some email clients have bugs with whitespace between block elements. Since I'm cleaning the HTML before deployment, I needed a way of doing this. @Czechnology's regex pattern works perfectly - http://stackoverflow.com/a/5362207/582278. – Dan Blows Mar 06 '12 at 16:21
  • 6
    i wonder when people say what's the point of this. i need that too! and there's always a reason – Mbarry Apr 18 '13 at 15:07
  • 2
    I'm surprised nobody has suggested this as a way of solving the inline-block issue that breaks when whitespace is between the elements (often in grid systems, but also elsewhere). I haven't tried this yet, but I came here looking for an alternative to `
    ...
    ` in my source.
    – James S Aug 09 '13 at 17:42
  • I need this for writing tests against my code - I'm refactoring and the whitespace may change - I need to test the content not the whitespace. – ErichBSchulz Oct 21 '13 at 13:39

15 Answers15

53

$html = preg_replace('~>\s+<~', '><', $html);

But I don't see the point of this. If you're trying to make the data size smaller, there are better options.

Czechnology
  • 14,832
  • 10
  • 62
  • 88
  • 4
    Well, where no one else sees a point, someone else is seeing a lot of them, outside the box... :D This regex works perfect for me. – Max Kielland May 05 '11 at 00:17
  • Perfect and simple. Totally works. Thanks for the solution. And yes, just because the point isn't obvious doesn't mean there isn't one. I needed a way to find-and-replace broken tags from a 3rd-party program. Trimming out the white space in the tags helped me get there and solve the problem. – Jared Mar 13 '12 at 20:57
  • 21
    Sadly this changes `Hello world` to `Helloworld`. Detecting whether a white space is meaningful or not is almost impossible (a list of inline and block level elements will be handy). – Salman A Mar 19 '12 at 13:19
  • 1
    @SalmanA is right - you need to be very careful about this regex because there are some instances where you don't want to remove whitespace in between tags. This could be inside `
      
    – Simon East Aug 11 '12 at 11:18
  • 1
    @Simon, this regex does exactly what the OP wrote (s)he wants: "remove whitespace between tags". Obviously that might not be the best behaviour for all uses but that's up to the OP. – Czechnology Aug 12 '12 at 14:49
  • 1
    Yeah, it may be perfect for the OP's situation, and that's fine. I just think it's an important disclaimer for those Googling '*remove whitespace from HTML*' (like I was). – Simon East Aug 12 '12 at 22:48
  • In case someone cares: I use this answer it to test my templates. I do not care if it adds whitespace as long as I get the expected html structure for my dummy data;) The problem Slaman mentioned is bad but better than no tests at all. – Oliver A. Nov 02 '12 at 23:43
  • @Salman A: The DTD normally covers whether whitespace is significant or not. preg_replace knows nothing about it :) . Using a HTML parser can help (if *Tidy* is not an option): [Stripping line breaks pre-XML leaves spaces- what is the proper method?](http://stackoverflow.com/q/15872092/367456) – hakre Apr 08 '13 at 10:12
  • Useful for returning a block of code - for example form data. I used this on a Wordpress shortcode return and it's perfect as I have control of the text though. – rob_was_taken Sep 02 '14 at 20:29
  • If you're worried about that space between inline elements, you could always use this: $html = preg_replace('~>\s+<~', '> <', $html); (It's exactly the same thing, but with a space between the replacement angle-brackets) – Gershom Maes Nov 24 '14 at 20:00
  • You'd be surprised why this is useful...in my case, Wordpress's `wpautop` function was putting `
    ` tags inside my SVG elements.
    – Ben Visness Jun 23 '17 at 15:37
  • @Czechnology: You said that if you're trying to make the data size smaller, there are better options. What are these options? – Muhammad Rohail Oct 27 '19 at 10:29
  • @muhammadrohail Data compression. Send/store data e.g. gzipped. – Czechnology Nov 02 '19 at 13:23
13

It's been a while since this question was first asked but I still see the need to post this answer in order to help people with the same problem.

None of these solutions were adoptabe for me therefore I've came up with this solution: Using output_buffer.

The function ob_start accepts a callback as an argument which is applied to the whole string before outputting it. Therefore if you remove whitespace from the string before flushing the output, there you're done.

/** 
 * Remove multiple spaces from the buffer.
 * 
 * @var string $buffer
 * @return string
 */
function removeWhitespace($buffer)
{
    return preg_replace('/\s+/', ' ', $buffer);
}

ob_start('removeWhitespace');

<!DOCTYPE html>
<html>
    <head></head>
    <body></body>
</html>

ob_get_flush();

The above would print something like:

<!DOCTYPE html> <html> <head> </head> <body> </body> </html>

Hope that helps.

HOW TO USE IT IN OOP

If you're using object-orientated code in PHP you may want to use a call-back function that is inside an object.

If you have a class called, for instance HTML, you have to use this code line

ob_start(["HTML","removeWhitespace"]); 
Savas Vedova
  • 5,622
  • 2
  • 28
  • 44
  • 2
    Savas, doesn't this remove the spaces you need aswell? say: `
    I need spaces here.
    There's a space to remove before this div.
    `
    – Jomar Sevillejo Sep 29 '15 at 00:21
  • 1
    @Jomar: no, it collapses sequences of multiple white-space characters into a single space. The example output in this answer is incorrect; it should be ` `. – Zilk Sep 29 '15 at 10:28
  • 1
    @JomarSevillejo my bad sorry, I updated the output as stated by Zilk. – Savas Vedova Oct 01 '15 at 13:59
5

Just in case someone still needs this, I coined a function from @Martin Angelova's response and @Savas Vedova, the outcome that also solved my problem looks:

<?php 
   function rmspace($buffer){ 
        return preg_replace('~>\s*\n\s*<~', '><', $buffer); 
   };
?>
<?php ob_start("rmspace");  ?>
   //Content goes in here 
<?php ob_end_flush(); ?>

Note: I did not test the performance penalty in a production environment

P.M
  • 2,880
  • 3
  • 43
  • 53
4

A RegEx replace could do the trick, something like:

$result = preg_replace('!\s+!smi', ' ', $content);
laander
  • 2,243
  • 2
  • 23
  • 15
3
$html = preg_replace('~>\s*\n\s*<~', '><', $html);

I'm thinking that this is the solution to the <b>Hello</b> <i>world</i> problem. The idea is to remove whitespace only when there's a new line. It will work for common HTML syntax which is:

<div class="wrap">
    <div>
    </div>
</div>
Servy
  • 202,030
  • 26
  • 332
  • 449
2

Thank you for posting this question. The problem is indeed dealing with whitespace bugs in certain environments. While the regex solution works in the general case, for a quick hack remove leading whitespace and add tags to the end of each line. PHP removes the newline following a closing ?>. E.g.:

<ul><?php ?>
<li><a id="nav-questions" href="/questions">Questions</a></li><?php ?>
<li><a id="nav-tags" href="/tags">Tags</a></li><?php ?>
<li><a id="nav-users" href="/users">Users</a></li><?php ?>
<li><a id="nav-badges" href="/badges">Badges</a></li><?php ?>
<li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li><?php ?>
</ul>

Obviously this is sub-optimal for a variety of reasons, but it'll work for a localized problem without affecting the entire tool chain.

Chris
  • 1,713
  • 2
  • 12
  • 16
2

The array reduce function:

$html = explode("\n", $html);
function trimArray($returner, $value) {
    $returner .= trim($value);
    return $returner;
}
echo $html = array_reduce($html, 'trimArray');
Martin Bean
  • 38,379
  • 25
  • 128
  • 201
Zeigen
  • 126
  • 2
  • 7
2

As gpupo's post provided the cleanest solution for many different types of spacing formatting's. However, a minor but important piece was forgotten at the end! A final string trim :-p

Below is a tested and working solution.

function compress_html($content)
{
    $i       = 0;
    $content = preg_replace('~>\s+<~', '><', $content);
    $content = preg_replace('/\s\s+/',  ' ', $content);

    while ($i < 5)
    {
        $content = str_replace('  ', ' ', $content);
        $i++;
    }

    return trim($content);
}
tfont
  • 10,891
  • 7
  • 56
  • 52
1
//...
public function compressHtml($content)
{
    $content = preg_replace('~>\s+<~', '><', $content);
    $content = preg_replace('/\s\s+/', ' ', $content);
    $i = 0;
    while ($i < 5) {
        $content = str_replace('  ', ' ', $content);
        $i++;    
    }

    return $content;
}
gpupo
  • 942
  • 9
  • 16
  • Tested, and this is the solution! See my below version for a minor update on the return forgetting a full string trim. – tfont Dec 11 '15 at 15:21
1

This works for me and it's easy to add/remove special cases. Works with CSS, HTML and JS.

function inline_trim($t)
{
    $t = preg_replace('/>\s*\n\s*</', '><', $t); // line break between tags
    $t = preg_replace('/\n/', ' ', $t); // line break to space
    $t = preg_replace('/(.)\s+(.)/', '$1 $2', $t); // spaces between letters
    $t = preg_replace("/;\s*(.)/", ';$1', $t); // colon and letter
    $t = preg_replace("/>\s*(.)/", '>$1', $t); // tag and letter
    $t = preg_replace("/(.)\s*</", '$1<', $t); // letter and tag
    $t = preg_replace("/;\s*</", '<', $t); // colon and tag
    $t = preg_replace("/;\s*}/", '}', $t); // colon and curly brace
    $t = preg_replace("/(.)\s*}/", '$1}', $t); // letter and curly brace
    $t = preg_replace("/(.)\s*{/", '$1{', $t); // letter and curly brace
    $t = preg_replace("/{\s*{/", '{{', $t); // curly brace and curly brace
    $t = preg_replace("/}\s*}/", '}}', $t); // curly brace and curly brace
    $t = preg_replace("/{\s*([\w|.|\$])/", '{$1', $t); // curly brace and letter
    $t = preg_replace("/}\s*([\w|.|\$])/", '}$1', $t); // curly brace and letter
    $t = preg_replace("/\+\s+\'/", "+ '", $t); // plus and quote
    $t = preg_replace('/\+\s+\"/', '+ "', $t); // plus and double quote
    $t = preg_replace("/\'\s+\+/", "' +", $t); // quote and plus
    $t = preg_replace('/\"\s+\+/', '" +', $t); // double quote and plus

    return $t;
}
Tengiz
  • 1,902
  • 14
  • 12
1

if you got 8 bit ASCII, is will remove them and keep the chars in range 128-255

 $text = preg_replace('/[\x00-\x1F\xFF]/', " ", $text );

If you have a UTF-8 encoded string is will do the work

$text = preg_replace('/[\x00-\x1F\x7F]/u', '', $text);

for more information you have this link more information

0
<?php
    define(COMPRESSOR, 1);

        function remove_html_comments($content = '') {
            return preg_replace('/<!--(.|\s)*?-->/', '', $content);
        }
        function sanitize_output($buffer) {
            $search = array(
                '/\>[^\S ]+/s',  // strip whitespaces after tags, except space
            '/[^\S ]+\</s',  // strip whitespaces before tags, except space
            '/(\s)+/s'       // shorten multiple whitespace sequences
          );

          $replace = array(
             '>',
             '<',
             '\\1'
          );

          $buffer = preg_replace($search, $replace, $buffer);
          return remove_html_comments($buffer);
        }
        if(COMPRESSOR){ ob_start("sanitize_output"); }
    ?>

    <html>  
        <head>
          <!-- comment -->
          <title>Example   1</title>
        </head>
        <body>
           <p>This is       example</p>
        </body>
    </html>


    RESULT: <html><head><title>Example 1</title></head><body><p>This is example</p></body></html> 
  • While this code snippet may solve the question, [including an explanation](http://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – msrd0 Mar 08 '15 at 20:43
0

I used this regex for me and it works like a charm:

preg_replace('/[ \t]+(?!="|\')/', '', $html);

These pattern looks for space whitespace and tabulator (at least one), that is not followed by " or '. This is, to avoid removing whitespaces between html attributes.

alpham8
  • 1,314
  • 2
  • 14
  • 32
0

Use regular expressions, like:

>(\s).*?<
bluefoot
  • 10,220
  • 11
  • 43
  • 56
-4

I can't delete this answer but it's no longer relevant, the web landscape has changed so much in 8 years that this has become useless.

Incognito
  • 20,537
  • 15
  • 80
  • 120
  • 30
    Google (more than qualified when it comes to performance) suggest via their Page speed tool, that it IS worth doing. When you use GZIP it will compress the extra unnecessary spaces. Obviously, if you remove this spaces before it is GZIP'd then of course the output will be smaller and more efficient. The answer is both! – Phil Ricketts Jun 19 '11 at 23:18
  • 3
    This is true. The real question comes down to scale and effort required. Remember, your time is finite, and so is your product. If you're serving 1000 hits a month on 200kb of html content, don't worry. If you're serving 1M hits a month on 5mb of HTML content, optimize like never before. If you have time as a luxury and want to learn how to do this, go ahead, but stripping whitespace to save 50% instead of 40% isn't going to reward you in many places except ySlow. – Incognito Jun 20 '11 at 13:19
  • That being said, if you're actually having problems with slow loading, there's a tool I use that's very usable for pinpointing issues and tracking history: gtmetrix.com – Incognito Jun 20 '11 at 13:24
  • 2
    I propose that this answer is downvoted, because it is incorrect. http://stackoverflow.com/questions/807119/gzip-versus-minify – Phil Ricketts Jun 23 '11 at 14:56
  • @replete Your linked question is about Javascript, this question is about removing white space from HTML for the sake of increased speed, which has been explained is negligible if we use gzip. The sample code in this question is 556 bytes, the gziped size is 202 btyes, and size with whitespace stripped is 362, that's right, it's larger if we don't gzip. – Incognito Jun 23 '11 at 15:20
  • @Incognito I understand that from a practical point of view, you are proposing that removing whitespace before gzipping gives little gain over just gzipping. Obviously, minifying first DOES make a difference to the gzipped output. But, you are making an absolute statement saying that there is no point. It's not correct, you should refine your answer. A notable point: In the context of websites, Google actually favours fast sites, to a degree. Look at Google Page Speed - it _does_ care if your site is minified or not! – Phil Ricketts Jun 27 '11 at 09:37
  • @Incognito but if you gzip the minified 362, it would be even smaller than 202 bytes. That's my point. – Phil Ricketts Jun 27 '11 at 09:41
  • 3
    @replete Right, it comes down to a whopping 183b, a whole 19b smaller. This is what I'm saying, after 1 000 000 page views, your savings in this situation would be 18 megabytes, and you've ended up breaking all your PRE tag content. Again, you shouldn't need to strip formatting of your HTML files, the servers deal with this. Why would you ever want to edit the file its self? All of this optimization should be done by the webserver, it's what it was built for. – Incognito Jun 27 '11 at 13:11
  • I thought Gziping and removing white-spaces is the best answer assuming your website is enough big and often visiting every day. Now I'm confused. If you look on some big websites source code you can see they all using stripping white spaces technique. For instance: * view-source:https://www.facebook.com/ * view-source:https://www.google.com/ * view-source:https://soundcloud.com/stream So is there any resource describing this problem in more details? – sobi3ch Oct 05 '13 at 13:10
  • If your template engine doesn't do it for you, you're not caching and using ESI, you're probably missing the point of doing gzip and strip space. – Incognito Oct 07 '13 at 12:49
  • 2
    Everything I google about this, Results in horrible answers like these. It's like people don't even think about the needs of others. Maybe they are asking the question for a different reason than minifying the website? I, for example, have to save some templating in the database. I simply want to compress my html for the database and not the eventual rendering. Sjeez. – NoobishPro Nov 12 '14 at 09:42
  • 1
    The questions does not state that's it's for web page compression. For example, I need to trim whitespaces because HTML to PDF generator renders extra whitespaces – gskema Oct 28 '16 at 08:25
  • The question was about how to do it, not whether it's recommended. – Dylan Kinnett Jun 07 '19 at 19:58