3

What is the regular expression for single line java comments: I am trying the following grammar :

     def single_comment(t):
          r'\/\/.~(\n)'
          #r'//.*$'
          pass

but, I am unable to ignore the single line comments how can I do it?

Nilesh Agrawal
  • 3,002
  • 10
  • 26
  • 54
  • Instead of `.~` do you mean `.*`? – Explosion Pills Mar 15 '13 at 02:23
  • 1
    Don't forget to *use* the regex! :P – Mark Rushakoff Mar 15 '13 at 02:25
  • is this `/*..I am single line too...*/` also single line comment? – Kent Mar 15 '13 at 02:26
  • What input string are you matching against, and using which method? `re.match()` is anchored at the beginning of the input string by default (IIRC it requires the whole string to match the RE), so your second RE [won't match](http://ideone.com/lEZKnX) e.g. `'some code; // comment`. – millimoose Mar 15 '13 at 02:27
  • You appear to be using raw strings, but are still escaping the slashes? – Moshe Mar 15 '13 at 02:30
  • What happens if you have what looks like a comment, **inside a string**? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Lasse V. Karlsen Mar 15 '13 at 21:13
  • I've suggested a pyparsing answer, which finds the comments... that could be adapted to remove the comments if necessary... – Jon Clements Mar 15 '13 at 21:20

2 Answers2

4

Python regular expression for matching single line comments (only matches comments that start with //, not /* */). Unfortunately, this regular expression is pretty ugly as it has to account for escaped characters and // within strings. You should find a more easily understandable solution if you ever need this in real code.

import re
pattern = re.compile(r'^(?:[^"/\\]|\"(?:[^\"\\]|\\.)*\"|/(?:[^/"\\]|\\.)|/\"(?:[^\"\\]|\\.)*\"|\\.)*//(.*)$')

This is a little script that runs a bunch of test strings against the pattern.

import re

pattern = re.compile(r'^(?:[^"/\\]|\"(?:[^\"\\]|\\.)*\"|/(?:[^/"\\]|\\.)|/\"(?:[^\"\\]|\\.)*\"|\\.)*//(.*)$')

tests = [
    (r'// hello world', True),
    (r'     // hello world', True),
    (r'hello world', False),
    (r'System.out.println("Hello, World!\n"); // prints hello world', True),
    (r'String url = "http://www.example.com"', False),
    (r'// hello world', True),
    (r'//\\', True),
    (r'// "some comment"', True),
    (r'new URI("http://www.google.com")', False),
    (r'System.out.println("Escaped quote\""); // Comment', True)
]

tests_passed = 0

for test in tests:
    match = pattern.match(test[0])
    has_comment = match != None
    if has_comment == test[1]:
        tests_passed += 1

print "Passed {0}/{1} tests".format(tests_passed, len(tests))
martega
  • 2,103
  • 2
  • 21
  • 33
  • Sorry about that, you are completely right. I didn't really put much thought into it at first as it seemed really easy. After looking thinking about it a little more deeply, it is definitely more complicated than I had thought. I updated my solution though with something that I think handles the general case. – martega Mar 15 '13 at 19:36
  • You can have a look at my discussion with Mike to check your answer. I don't think that it can handle general case. – nhahtdh Mar 15 '13 at 19:42
  • Can you give me an example of a case that doesn't pass? – martega Mar 15 '13 at 19:53
  • `System.out.println("Escaped quote\""); // Comment` and `new URI("http://www.google.com");` are 2 examples of failure. As long as you don't match a string in Java properly, you will never reach the general case. – nhahtdh Mar 15 '13 at 20:00
  • I think your second example works with the code above, unless I missed something. I didn't think about escaped " characters before but I'll try to update the regex to take that into account. – martega Mar 15 '13 at 20:16
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/26267/discussion-between-nhahtdh-and-martega) – nhahtdh Mar 15 '13 at 20:22
  • Thank you, best answer in whole internet – Avo Asatryan Nov 18 '19 at 12:05
3

I think this works (using pyparsing):

data = """
class HelloWorld {

    // method main(): ALWAYS the APPLICATION entry point
    public static void main (String[] args) {
        System.out.println("Hello World!"); // Nested //Print 'Hello World!'
        System.out.println("http://www.example.com"); // Another nested // Print a URL
        System.out.println("\"http://www.example.com"); // A nested escaped quote // Print another URL
    }
}"""


from pyparsing import *
from pprint import pprint
dbls = QuotedString('"', '\\', '"')
sgls = QuotedString("'", '\\', "'")
strings = dbls | sgls
pprint(dblSlashComment.ignore(strings).searchString(data).asList())

[['// method main(): ALWAYS the APPLICATION entry point'],
 ["// Nested //Print 'Hello World!'"],
 ['// Another nested // Print a URL'],
 ['// A nested escaped quote // Print another URL']]

Should you have /* ... */ style comments, that happen to have single line comments in them, and don't actually want those, then you can use:

pprint(dblSlashComment.ignore(strings | cStyleComment).searchString(data).asList())

(as discussed in https://chat.stackoverflow.com/rooms/26267/discussion-between-nhahtdh-and-martega)

Community
  • 1
  • 1
Jon Clements
  • 138,671
  • 33
  • 247
  • 280