4

I'm trying to pull code comment blocks out of JavaScript files. I'm making a light code documentator.

An example would be:

/** @Method: setSize
 * @Description: setSize DESCRIPTION
 * @param: setSize PARAMETER
 */

I need to pull out the comments setup like this, ideally into an array.

I had gotten as far as this, but realize it may not handle new lines tabs, etc.:

\/\*\*(.*?)\*\/

(Okay, this seems like it would be simple, but I'm going in circles trying to get it to work.)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 2
    I am not sure that regexp is the best tool to use for this one as you're dealing with multiple lines and parsing logic depends on whether it's the first/last/middle line... – Oleg Mikheev Feb 13 '12 at 15:09

3 Answers3

5

Depending on what you want to continue doing with the extracted docblocks, multiple approaches come to mind. If you simply need the docblocks without further references, String.match() may suffice. Otherwise you might need the index of the block.

As others have already pointed out, javascript's RegEx machine is everything but powerful. if you're used to PCRE, this feels like working with your hands tied behind your back. [\s\S] (space-character, non-space-character) is equivalent to dotAll - also capturing linebreaks.

This should get you started:

var string = 'var foo = "bar";'
    + '\n\n'
    + '/** @Method: setSize'
    + '\n * @Description: setSize DESCRIPTION'
    + '\n * @param: setSize PARAMETER'
    + '\n */'
    + '\n'
    + 'function setSize(setSize) { return true; }'
    + '\n\n'
    + '/** @Method: foo'
    + '\n * @Description: foo DESCRIPTION'
    + '\n * @param: bar PARAMETER'
    + '\n */'
    + '\n'
    + 'function foo(bar) { return true; }';

var docblock = /\/\*{2}([\s\S]+?)\*\//g,
    trim = function(string){ 
        return string.replace(/^\s+|\s+$/g, ''); 
    },
    split = function(string) {
        return string.split(/[\r\n]\s*\*\s+/);
    };

// extract all doc-blocks
console.log(string.match(docblock));

// extract all doc-blocks with access to character-index
var match;
while (match = docblock.exec(string)) {
    console.log(
        match.index + " characters from the beginning, found: ", 
        trim(match[1]), 
        split(match[1])
    );
}
rodneyrehm
  • 13,442
  • 1
  • 40
  • 56
1

This should grab a comment block \/\*\*[^/]+\/. I don't think Regexp is the best way to generate an array from these blocks though. This regexp basically says:

Find a /** (the asterisk and forward slashes are escaped with \)

then find anything that isn't a /

then find one /

It's crude but is should generally work. Here's a live example http://regexr.com?300c6

punkrockbuddyholly
  • 9,675
  • 7
  • 36
  • 69
  • A better way to find the end is to use the non-greedy pattern `.*?\*\/`. The first part (`.*?`) matches anything, but gets the shortest pattern that matches. Then `\*\/` matches the end of the comment. – mcrumley Feb 13 '12 at 15:59
  • @mcrumley That is a little cleaner, although you need the dotall flag enabled otherwise the `.*?` doesn't match return characters. I don't think javascript supports the dotall flag. – punkrockbuddyholly Feb 13 '12 at 16:04
  • @mcrumley This question confirms that the dotall flag isn't supported in javascript but suggests a workaround with `[\s\S]*?` http://stackoverflow.com/questions/1068280/javascript-regex-multiline-flag-doesnt-work – punkrockbuddyholly Feb 13 '12 at 16:06
0

What about some magic :)

comment.replace(/@(\w+)\s*\:\s*(\S+)\s+(\w+)/gim, function (match, tag, name, descr) {
    console.log(arguments);
    // Do sth. ...
});

I've not tested this so for the regex there is no guarantee, just to point you to a possibility do some RegExp-search the John Resig way 8-)

mfeineis
  • 2,607
  • 19
  • 22