1

What would the JavaScript regex be to minify contents of HTML. Note that I only want to remove spaces that are >2 and nothing below.

I also want to replace single quotation marks ' ' with double " "

This is what I got so far, although I'm guessing there's a more efficient way of doing this:

var findSpaces = content.match(' ') >= 2;
var findQuotes = content.match(" ' ");

content.replace(findSpaces, "" );

content.replace(findQuotes, ' " ' );

No jQuery please

TylerH
  • 20,799
  • 66
  • 75
  • 101
user3143218
  • 1,738
  • 5
  • 32
  • 48
  • I don't think this is a situation where you can "_roll your own_" and expect to get it right without spending hours and hours... If you *KNOW* you will *ALWAYS* operate *ONLY* on trivially simple HTML, then you _might_ have a chance... – jahroy Apr 25 '14 at 05:42
  • Replacing single quotation marks with double quotation marks will break code where double quotation marks are contained in a string. Removing all extra white space without regard to if the space is inside of quotes can also break code. – Wayne Aug 19 '14 at 05:30
  • Related - https://stackoverflow.com/q/44841365/104380 – vsync Dec 16 '19 at 12:52

2 Answers2

1

In the below example all new lines \r\n or spaces between HTML tags are removed, and on the second phase the content within HTML tags is minified, so extra spaces are eliminated.

Finally trim() is used to remove spaces before & after the final resulting string.

// dummy string to minify
var s = `

    <div   value="a"     class="a b"   id="a">
      <div>
        foo   bar  
        <br><br>
        <span>baz</span>   <i>a</i>  
      </div>
    </div>
`

function minify( s ){
  return s
    .replace(/\>[\r\n ]+\</g, "><")
    .replace(/(<.*?>)|\s+/g, (m, $1) => $1 ? $1 : ' ')
    .trim()
}

console.log(  minify(s)  )

The above is also available as a gist in my collection

vsync
  • 118,978
  • 58
  • 307
  • 400
  • I'm not sure that removing white spaces between phrasing content elements (aka "inline elements" prior to HTML5) is what is usually needed from code minification. It can significantly change the meaning of the content. – Ilya Streltsyn Dec 16 '19 at 13:58
  • I do it in all my projects where I have html templates to inject into the DOM, and it's actually a must-do. spaces between tags *might* mess up layout and interfere with CSS, and removing those spaces never caused me any harm, when you know on what to apply this to of course – vsync Dec 16 '19 at 14:26
  • 1
    so you noticed the potential problem in my comment above, right? ;) BTW, I strongly believe that spaces _never_ interfere with CSS, it's (suboptimal) CSS that might interfere with them and mess up the layout (like using inline-blocks for horizontal arrangement of blocks, that Flexbox is designed for:) – Ilya Streltsyn Dec 17 '19 at 14:26
  • Just an [example](https://stackoverflow.com/q/5078239/104380) out of many when minifying content is helpful. This question is one of the *most popular* on this website. Another popular example [here](https://stackoverflow.com/q/2628050/104380). People keep asking this and getting stuck on such things without knowing the importance of removing spaces between elements that *shouldn't* be there – vsync Dec 17 '19 at 19:03
  • 1
    Yes, this is the most notorious example of applying the CSS mechanism where spaces are meaningful (inline formatting) to the task where it shouldn't (horizontal layout of blocks) unsurprisingly giving unwanted results. So the correct question is "How to make the layout not depending on source formatting?" and the correct answer is using the true layout mechanism (most likely Flexbox) instead of faking it with wrong means. In 2011 this question made sense, but now it should become history. And the silver bullet illusion that auto-removing spaces might give could cause other problems. – Ilya Streltsyn Dec 17 '19 at 19:32
0

var s = `

    <div   value="a"     class="a b"   id="a">
      <div>
        foo bar  
        <br><br>
        <span>baz</span>   <i>a</i>  
      </div>
    </div>
`

console.log(
  s.replace(/\s{2,}/g, ' ').replace(/\'/g, '"')
)

should do the job for you

aelor
  • 10,892
  • 3
  • 32
  • 48
  • What if there are escaped single quotes? What if there are single quotes that aren't used in HTML attributes? What if there are _preformatted_ pieces of content that contain multiple spaces and need to be preserved? – jahroy Apr 25 '14 at 05:33
  • then also it will match the single quote , do you not want to match the escaped single quotes ? – aelor Apr 25 '14 at 05:35
  • 1
    The OP has made little effort to define his actual requirements, but he did mention that he wants to "_minify HTML_", which would indicate that all these possibilities need to be considered. – jahroy Apr 25 '14 at 05:39
  • This works pretty well. Although what about when I'm placing "\s{2,}" inside a variable to use later? i.e var = "\s{2,}/g" – user3143218 Apr 25 '14 at 06:28
  • It's not good enough. As you can see from the example, if an HTML string has attributes with more than a single space between them, the whole string will become invalid because all the spaces will be removed – vsync Dec 16 '19 at 12:37