2

I want to remove HTML and JavaScript comments automatically. I am using ant-scripts for deployment and JSF on the server. What options or tools are available? Thanks in advance.

Jochen
  • 1,746
  • 4
  • 22
  • 48
  • You may be able to do it with a regular expression, but HTML is notoriously difficult to parse with regex. – shauneba Jan 30 '13 at 09:04
  • Does this answer your question? [Remove HTML comments with Regex, in Javascript](https://stackoverflow.com/questions/5653207/remove-html-comments-with-regex-in-javascript) – justFatLard Oct 31 '20 at 07:06

4 Answers4

3

Replacing comments in files that mix HTML and JavaScript with regexes is risky. However, separately, you can do with good performance without relying on external tools, only node.js:

For HTML comments use the regex /<!--(?!>)[\S\s]*?-->/g. example:

function stripHtmlComments(content) {
  return content.replace(/<!--(?!>)[\S\s]*?-->/g, '');
}

Removing JavaScript comments is a bit more complex, you need mix several regexes to differentiate when comments are inside literal strings or regexes, and when a slash belongs to a regex :)

This tiny program removes both multiline and single-line comments from JavaScript files:

#!/usr/bin/env node
/*
  Removes multiline and single-line comments from a JavaScript source file.
  Author: aMarCruz - https://github.com/aMarCruz
  Usage: node [this-tool] [js-file]
*/
var path = require('path'),
    fs = require('fs'),
    file,
    str;

var RE_BLOCKS = new RegExp([
    /\/(\*)[^*]*\*+(?:[^*\/][^*]*\*+)*\//.source,           // $1: multi-line comment
    /\/(\/)[^\n]*$/.source,                                 // $2 single-line comment
    /"(?:[^"\\]*|\\[\S\s])*"|'(?:[^'\\]*|\\[\S\s])*'/.source, // string, don't care about embedded eols
    /(?:[$\w\)\]]|\+\+|--)\s*\/(?![*\/])/.source,           // division operator
    /\/(?=[^*\/])[^[/\\]*(?:(?:\[(?:\\.|[^\]\\]*)*\]|\\.)[^[/\\]*)*?\/[gim]*/.source
    ].join('|'),                                            // regex
    'gm'  // note: global+multiline with replace() need test
    );

file = process.argv[2];
if (!path.extname(file))
    file += '.js';
str = fs.readFileSync(file, { encoding: 'utf8' });

console.log(stripJSComments(str));

// remove comments, keep other blocks
function stripJSComments(str) {
    return str.replace(RE_BLOCKS, function (match, mlc, slc) {
        return mlc ? ' ' :     // multiline comment (must be replaced with one space)
               slc ? '' :      // single-line comment
               match;          // divisor, regex, or string, return as-is
        });
}

Now (example) save as rcomms and run with:

node rcomms source-file > clean-file.js

NOTE: This code is based on regexes from jspreproc, if you need more advanced processing, please visit http://github.com/aMarCruz/jspreproc.

I wrote jspreproc to deploy some riot modules. jspreproc remove empty lines, supports filters for preserve some comments and conditional comments in C-style: #if-else,endif, #define, #include, etc.

aMarCruz
  • 2,434
  • 1
  • 16
  • 14
1

You can use regular expressions to remove them with ease. For example, you can remove HTML comments by replace the matches of the regular expression /\<!--(.*)-\>/gi to nothing.

Licson
  • 2,231
  • 18
  • 26
  • If one has two comments on a line with content in between, that will remove any non commented content in between, and it will not work for multi line comments. For that to work one needs to make it lazy rather than greedy (just a modifier in some regex engines: (.*?) ) and set it to operate on the entire file at once and not line wise. – miyalys Aug 22 '19 at 12:28
1

Library decomment does exactly what you described - removes comments from JSON, JavaScript, CSS, HTML, etc.

For use within the gulp system see gulp-decomment

vitaly-t
  • 24,279
  • 15
  • 116
  • 138
  • 1
    Note the following regarding mixed content: `The library does not support mixed content - HTML with JavaScript or CSS in it. Once the input code is recognized as HTML, only the HTML comments will be removed from it.` – gordon613 Aug 09 '16 at 14:30
0

Make a new target and use replaceregexp to replace all comments and other things you dont want in these files.

You could do sth. like that for html and something similar for js:

<target name="-trim.html.comments">

    <fileset id="html.fileset"
        dir="${build.dir}"
        includes="**/*.jsp, **/*.php, **/*.html"/>

    <!-- HTML Comments -->
    <replaceregexp replace="" flags="g"
        match="\&lt;![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\&gt;">
        <fileset refid="html.fileset"/>
    </replaceregexp>

</target>

Source: http://www.julienlecomte.net/blog/2007/09/23/

tbraun89
  • 2,246
  • 3
  • 25
  • 44