0

I'm not able to remove python comment from source, using javascript regular expression, and negative lookahead in general exclude an inline comment, excluding string

what I've tried is this

regex: /#.*(?!')/gi

test file:

class AAA:
    """
        item
    """
     ''''
    SRC_TYPE = (
        ('cs', 'src C# for all'),       # this is a comment a 'comment'
        ('cpp', 'C++'),
        ('ascript', 'a  '),#djshdjshdjshds
        ('script', 'tst C#')
    )

but don't works

LXG
  • 1,957
  • 3
  • 22
  • 45
  • Did you search on *removing comments + regex*? – revo May 25 '18 at 19:34
  • I mean remove comment (python comment) using regex yes. – LXG May 25 '18 at 19:36
  • You can't find a bulletproof regular expression to strip off comments from a code since regex can not be aware about language syntax and if one exists it will break on some conditions. There are lots of questions for removing comments, no one guarantees a right match. – revo May 25 '18 at 20:00
  • Not sure about string syntax in Python, but you should be able to match both quotes and comments via alternation, then within a callback (JS) return the quoted part, or return the empty string if a comment. `/('[^']*'|"[^"]*")|(#.*)/` So, if group 1 matched, return group 1, else return ''. –  May 25 '18 at 20:38

1 Answers1

3

This is tricky. I suggest using the trash-can approach and throw everything that does not need to be replaced in the full match and capture the desired output in group 1:

(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*')|(#.*$)

Demo

Sample using replace with callback function:

const regex = /(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*')|(#.*$)/gm;
const str = `class AAA:
    """
        item
    """
     ''''
    SRC_TYPE = (
        ('cs', 'src C# for all'),       # this is a comment a 'comment'
        ('cpp', 'C++'),
        ('ascript', 'a  '),#djshdjshdjshds
        ('script', 'tst C#')
    )`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, function(m, group1) {
    if (group1 == null ) return m;
    else return "";
});

console.log('Substitution result: ', result);

The hard part is done by Casimir et Hippolyte's ECMA script regex.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Just to be clear, the first part of the regex (trash can) grabs everything between single/double quotes also deals with escaped quotes. That's why it looks so intimidating. – wp78de May 26 '18 at 19:44