Regular Expression for single line comments

Question

What is the regular expression for single line java comments: I am trying the following grammar :

     def single_comment(t):
          r'\/\/.~(\n)'
          #r'//.*$'
          pass

but, I am unable to ignore the single line comments how can I do it?

is this `/*..I am single line too...*/` also single line comment? — Kent, Mar 15 '13 at 02:26
What input string are you matching against, and using which method? `re.match()` is anchored at the beginning of the input string by default (IIRC it requires the whole string to match the RE), so your second RE [won't match](http://ideone.com/lEZKnX) e.g. `'some code; // comment`. — millimoose, Mar 15 '13 at 02:27
You appear to be using raw strings, but are still escaping the slashes? — Moshe, Mar 15 '13 at 02:30
What happens if you have what looks like a comment, **inside a string**? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Lasse V. Karlsen, Mar 15 '13 at 21:13
I've suggested a pyparsing answer, which finds the comments... that could be adapted to remove the comments if necessary... — Jon Clements, Mar 15 '13 at 21:20

martega · Accepted Answer · 2013-03-15T21:25:54.953

4

Python regular expression for matching single line comments (only matches comments that start with //, not /* */). Unfortunately, this regular expression is pretty ugly as it has to account for escaped characters and // within strings. You should find a more easily understandable solution if you ever need this in real code.

import re
pattern = re.compile(r'^(?:[^"/\\]|\"(?:[^\"\\]|\\.)*\"|/(?:[^/"\\]|\\.)|/\"(?:[^\"\\]|\\.)*\"|\\.)*//(.*)$')

This is a little script that runs a bunch of test strings against the pattern.

import re

pattern = re.compile(r'^(?:[^"/\\]|\"(?:[^\"\\]|\\.)*\"|/(?:[^/"\\]|\\.)|/\"(?:[^\"\\]|\\.)*\"|\\.)*//(.*)$')

tests = [
    (r'// hello world', True),
    (r'     // hello world', True),
    (r'hello world', False),
    (r'System.out.println("Hello, World!\n"); // prints hello world', True),
    (r'String url = "http://www.example.com"', False),
    (r'// hello world', True),
    (r'//\\', True),
    (r'// "some comment"', True),
    (r'new URI("http://www.google.com")', False),
    (r'System.out.println("Escaped quote\""); // Comment', True)
]

tests_passed = 0

for test in tests:
    match = pattern.match(test[0])
    has_comment = match != None
    if has_comment == test[1]:
        tests_passed += 1

print "Passed {0}/{1} tests".format(tests_passed, len(tests))

edited Mar 15 '13 at 21:25

answered Mar 15 '13 at 02:50

martega

2,103
2
21
33

Sorry about that, you are completely right. I didn't really put much thought into it at first as it seemed really easy. After looking thinking about it a little more deeply, it is definitely more complicated than I had thought. I updated my solution though with something that I think handles the general case. – martega Mar 15 '13 at 19:36
You can have a look at my discussion with Mike to check your answer. I don't think that it can handle general case. – nhahtdh Mar 15 '13 at 19:42
Can you give me an example of a case that doesn't pass? – martega Mar 15 '13 at 19:53
`System.out.println("Escaped quote\""); // Comment` and `new URI("http://www.google.com");` are 2 examples of failure. As long as you don't match a string in Java properly, you will never reach the general case. – nhahtdh Mar 15 '13 at 20:00
I think your second example works with the code above, unless I missed something. I didn't think about escaped " characters before but I'll try to update the regex to take that into account. – martega Mar 15 '13 at 20:16
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/26267/discussion-between-nhahtdh-and-martega) – nhahtdh Mar 15 '13 at 20:22
Thank you, best answer in whole internet – Avo Asatryan Nov 18 '19 at 12:05

score 3 · Answer 2 · edited Jan 18 '21 at 12:34

I think this works (using pyparsing):

data = """
class HelloWorld {

    // method main(): ALWAYS the APPLICATION entry point
    public static void main (String[] args) {
        System.out.println("Hello World!"); // Nested //Print 'Hello World!'
        System.out.println("http://www.example.com"); // Another nested // Print a URL
        System.out.println("\"http://www.example.com"); // A nested escaped quote // Print another URL
    }
}"""


from pyparsing import *
from pprint import pprint
dbls = QuotedString('"', '\\', '"')
sgls = QuotedString("'", '\\', "'")
strings = dbls | sgls
pprint(dblSlashComment.ignore(strings).searchString(data).asList())

[['// method main(): ALWAYS the APPLICATION entry point'],
 ["// Nested //Print 'Hello World!'"],
 ['// Another nested // Print a URL'],
 ['// A nested escaped quote // Print another URL']]

Should you have /* ... */ style comments, that happen to have single line comments in them, and don't actually want those, then you can use:

pprint(dblSlashComment.ignore(strings | cStyleComment).searchString(data).asList())

(as discussed in https://chat.stackoverflow.com/rooms/26267/discussion-between-nhahtdh-and-martega)

Regular Expression for single line comments

2 Answers2

Linked