0

I have two types of comments as following. I have regular expression \/\*.*\*\/ does not find second type of comments but first one. I think it is because of its having multiple lines?

What modification is required for regular expression to find both types of comments?

First type:

/* Comment type1 */

Second type:

/* 

 * JD-Core Version:    0.7.0.1

 */
Ahmet Karakaya
  • 9,899
  • 23
  • 86
  • 141
  • 1
    What programming language/tool/regex engine are you using? The semantics of `.` and the availble flags to change them can differ – Bergi Jun 20 '14 at 11:38
  • You haven't said what language you're using - it will make a difference to the content and format of a helpful answer. – Bohemian Jun 20 '14 at 11:40
  • I believed that regular expression is common for all tools. Anyway I used notepad++ and eclipse IDE – Ahmet Karakaya Jun 20 '14 at 11:41

5 Answers5

6

I suggest another solution:

\/\*([\S\s]+?)\*\/

This will avoid the dot, who is greedy in resouces.

Hermios
  • 622
  • 1
  • 5
  • 20
  • It works. thanks. Why other solutions do not work though they work at DEMO page succefully. Does not regular expression common syntax? – Ahmet Karakaya Jun 20 '14 at 11:53
  • That's like saying "I'm using french fries to avoid potatoes." `[\S\s]` is equivalent to `.`, except that the former will actually match *more* than the dot (newlines too). – tckmn Jun 20 '14 at 11:57
  • what does ([\S\s]+?) mean could you explain a little bit? – Ahmet Karakaya Jun 20 '14 at 12:02
  • I cannot explain why it is less greedy. I just learned it, sorry. ([\S\s]+?)-> you look for any separator(\s) or (through the []) anything but separator(\S). So, at the end, you look for everything. + means several instances (1 to inifinite).+? means, you stop at the first instance you find on what it after. Regex is natural greedy, if you don't put the ?, it will look after the last instance of what is after. I hope it is clear (Else, let me know, I will try to explain in another way) – Hermios Jun 20 '14 at 12:13
  • * is a special char in regex, it will compute. Use escape char : \* to look for the char * – Hermios Jun 20 '14 at 12:14
  • @mmc18 It was a rethorical question. If you were to time the two, `[\S\s]` is probably slower than the `.*` in `(?s)/\*.*?\*/`. Of course on a modern processor we are talking millionths of seconds and it does not matter at all. They match the same, one of them is just a little harder to read. Also, the parentheses around `*([\S\s]+?)` are not needed... Consuming too much resources!... lol – zx81 Jun 21 '14 at 03:40
  • 1
    @mmc18 Also if you have further questions about this feel free to visit the [regex chatroom](http://chat.stackoverflow.com/rooms/25767/regex) – zx81 Jun 21 '14 at 03:42
3

Assuming your language does not check if comments are nested, you can go for this:

(?s)/\*.*?\*/

You say you use Notepad++: here is a screenshot of the regex at work.

Notepad++ Regex

zx81
  • 41,100
  • 9
  • 89
  • 105
0

The dot doesn't match newlines, so add the DOTALL flag (s) to your regex:

/\/\*.*?\*\//s

And use a reluctant quantifier for the dot .*? (ie add a question mark) so it will stop at the first closing */

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

Your regex should be the below with sg modifiers,

/\/\*.*?\*\//sg

If you want to capture the text which was present inside the /* */, the go for the below regex,

/\/\*(.*?)\*\//sg

Explanation:

s    #  Dotall
g    #  global
*?   #  Non-greedy match

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

The dot . does not always match newlines. You can try to use flags to modify this, or use an explicit character class. Here is a version that does not allow everything but the sequence */ inside the comment:

/\/\*([^\/]|[^*]\/)*\*\//

(demo)

Bergi
  • 630,263
  • 148
  • 957
  • 1,375