-1

I'm trying to replace all kinds of comments (single, inline & multiline). The initial regex worked absolutely fine when // & /* */ didn't occur between any kind of quotes, "" or """""" . When I modified the regex a bit to handle and exclude the occurances of // between quotes, its failing and messing up the initial code structure as well.

Here was my initial regex (Regex:1): (?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)

Here was the regex I tweaked to try and handle the single line comments inside quotes (Regex:2): (?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|[^\"](?://.*)[^\"]

Consider this sample data:

// Comment 1
/* Multiline comments
ends here */  Some text
Random statement // something else
import something..
import something else /* few random stuff
that goes on */ /* Lets try this again */
Text to show
val tryThis = "  something // else "
val tryAgain = "12345" 
val again = " /* kskokds // */ "

Actual result of Regex:1 =>

  Some text
Random statement 
import something..
import something else  
Text to show
val tryThis = "  something 
val tryAgain = "12345" 
val again = "  "

Actual result of Regex:2 =>

// Comment 1
  Some text
Random statementimport something..
import something else  
Text to show
val tryThis = "  somethingval tryAgain = "12345" 
val again = "  "

Expected Result =>

  Some text
Random statement 
import something..
import something else  
Text to show
val tryThis = "  something // else "
val tryAgain = "12345" 
val again = " /* kskokds // */ "
Highdef
  • 73
  • 8
  • I don't think it's possible to parse comments with regex. You need a more complex parser that keeps some state/state-machine – pedrorijo91 May 19 '19 at 08:23

1 Answers1

1

I'm in time here to be the first person to post a link to this famous question: RegEx match open tags except XHTML self-contained tags

A serious answer is

I think the flaw here is that HTML is a Chomsky Type 2 grammar (context free grammar) and RegEx is a Chomsky Type 3 grammar (regular grammar). Since a Type 2 grammar is fundamentally more complex than a Type 3 grammar (see the Chomsky hierarchy), it is mathematically impossible to parse XML with RegEx.

The standard of Java comments is not a context-free grammar as well. So everything been said about parsing html is applicabe here.

simpadjo
  • 3,947
  • 1
  • 13
  • 38
  • Note that regex are not regular expressions. Most regex engines have features that allow them to parse non-regular languages. Which languages, *exactly* is usually never documented not investigated, though. But, for example, I believe Ruby's `Regexp` can parse all context-free, almost all (or maybe even all) context-sensitive languages, and maybe even some Type0 ones. They have (named) recursion, for example, which is typically one of those things that proofs about Turing-completeness use. Perl regexes can check for prime numbers! – Jörg W Mittag May 19 '19 at 07:36
  • Thank you but I was looking to see if anyone could help me actually solve this issue. Turns out, I had to write an entirely custom code from scratch avoiding regex, and using pattern matching to resolve it. – Highdef May 19 '19 at 19:38
  • You can try to find the place in the source code of IntelliJ where they do it. – simpadjo May 19 '19 at 20:24