0

I am programming the python code for removing the comments from source code too. But I want to keep the title of source code like

//**********************************
//*author
//*Function
//**********************************

and

//example

just remove // example (if there are blank after //).

I refer to this code, the highest score.

Using regex to remove comments from source files

def remove_comments(string):
    pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)"
    # first group captures quoted strings (double or single)
    # second group captures comments (//single-line or /* multi-line */)
    regex = re.compile(pattern, re.MULTILINE|re.DOTALL)
    def _replacer(match):
        # if the 2nd group (capturing comments) is not None,
        # it means we have captured a non-quoted (real) comment string.
        if match.group(2) is not None:
            return "" # so we will return empty to remove the comment
        else: # otherwise, we will return the 1st group
            return match.group(1) # captured quoted-string
    return regex.sub(_replacer, string)

I change a little for

pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//(?!(\*|\w))[^\r\n]*$)"

It did not work for //*.

But I change * to # like

pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//(?!(#|\w))[^\r\n]*$)"

//##################################
//#author
//#Function
//##################################

It work.

I just confuse what's difference between # and *? Thanks for your help.

ruud
  • 743
  • 13
  • 22
Jinfeng
  • 1
  • 1
  • 1
    What language is the source code written in? – John Mee Feb 23 '23 at 04:54
  • I'm guessing the important part is the `//` which marks the line as a comment. Following it with `*` or `#` has no meaning... since it is a comment, anything following the initial `//` is ignored. – John Mee Feb 23 '23 at 04:56
  • Are you are that the asterix `*` is a special character when composing a regex? It is a wildcard. So, if you want to match the `*` you need to escape it, like this `\*`. The `#` is not special, and does not need to be escaped. – John Mee Feb 23 '23 at 04:57
  • Thanks for every one’s help. I use python. Yes, you are right, the reason is //. I changed my pattern to pattern = r"(\".*?\"|\'.*?\')|((?<!/)/\*.*?\*/|//(?!(\*|\w))[^\r\n]*$)" – Jinfeng Mar 01 '23 at 08:13

1 Answers1

0

Assuming a string of

#The \n would be the linebreaks to hold multiple lines in a single string
string = '/**********************************\n//*author\n//*Function\n//*********************************\n//example\n// example'

and that you want to only remove //example and not anything within encompassing blocks, you can do

regex = re.compile(r'^(\/{1,2})(?![\*\s\/]{1,}).*$', re.MULTILINE)
re.sub(regex, "", string)

#output
'/**********************************\n//*author\n//*Function\n//*********************************\n\n// example'

There is still the \n remaining where the text used to be. Although keep in mind that something like ///example would not be removed.

Shorn
  • 718
  • 2
  • 13