2

I need to remove all types of comments from my string without affecting the URL defined in that string. When i tried removing comments from string using regular expression some part of the URL also removed from the string. I tried the following regex but the same issue happening.

    String sourceCode= "/*\n"
                + " * Multi-line comment\n"
                + " * Creates a new Object.\n"
                + " */\n"
                + "public Object someFunction() {\n"
                + " // single line comment\n"
                + " Object obj =  new Object();\n"
                + " return obj; /* single-line comment */\n"
                + "}"
                + "\n"
                + "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";

    sourceCode=sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", "");
    System.out.println(sourceCode);

but anyway the comments are removed but the out put is showing like this

    public Object someFunction() {
        Object obj =  new Object();
        return obj; 
    }
    https:

please help me to find out a solution for this.

Sunil Kanzar
  • 1,244
  • 1
  • 9
  • 21
Ragesh ck
  • 31
  • 8
  • Well, that last line is incorrect at first, it is not in a `String` so it should not be there (code can't compile with this). Now, you should check if `//` is in a String or not too – AxelH Jun 16 '17 at 07:16
  • You're better off using a tokenizer which can read the source code and create tokens from it. You can then get the comment text and search for any URL within. – MC Emperor Jun 16 '17 at 07:16
  • could you please give any suggested docs or give any examples using tokenizer. – Ragesh ck Jun 16 '17 at 07:22
  • For example, http://www.java2s.com/Tutorial/Java/0180__File/TokenizingJavaSourceCode.htm, but there are many around there – MC Emperor Jun 16 '17 at 07:29

3 Answers3

1

[^:]//.*|/\\*((.|\\n)(?!=*/))+\\*/ Changes are in first few characters - [^:]. This means that symbol before // must be not :.

I usually use regex101.com to work with regular expressions. Select python language for your case (since languages use a little bit different escaping).

This is quite complex regexp to be read by human, so another solultion may be in using several simple expressions and process incoming text in multiple passes. Like

  1. Remove one-line comments
  2. Remove multiline comments
  3. Process some special cases

Note: Processing regexp costs pretty much time. So if performance is required, you should check for another solution - your own processor or third-party libraries.

EDITED As suggested @Wiktor expression [^:]//.*|/\\*((?!=*/)(?s:.))+\\*/ is faster solution. At least 2-3 times faster.

invenit
  • 424
  • 4
  • 11
  • 1
    Sorry to edit the answer, but I strongly suggest to replace `(.|\\n)` with `(?s:.)` or `(?s).`. `(.|\\n)+` will make the pattern extremely inefficient. – Wiktor Stribiżew Jun 16 '17 at 07:47
  • I really mean it, `(.|\\n)+` or `(.|\\n)*?` and these variations are performance killers when placed inside a larger regex. – Wiktor Stribiżew Jun 16 '17 at 08:17
0

You can split your String by "\n" and check each line. Here is the tested code:

String sourceCode= "/*\n"
            + " * Multi-line comment\n"
            + " * Creates a new Object.\n"
            + " */\n"
            + "public Object someFunction() {\n"
            + " // single line comment\n"
            + " Object obj =  new Object();\n"
            + " return obj; /* single-line comment */\n"
            + "}"
            + "\n"
            + "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";

String [] parts = sourceCode.split("\n");

System.out.println(getUrlFromText(parts));

Here is the fetching method:

private static String getUrlFromText(String []parts) {
    for (String part : parts) {
        if(part.startsWith("http")) {
            return part;
        }
    }

    return null;
}
ahmetcetin
  • 2,621
  • 1
  • 22
  • 38
0

For more specific this EXP should be use

.*[^:]//.*|/\\*((.|\\n)(?!=*/))*\\*/

Your provided pattern was not able to remove /**/ portion of code if it is there.(If it is special requirement then its fine)

So Your EXP is like :
enter image description here

And it should be:
enter image description here

For more understanding visit and use your EXP .*[^:]\/\/.*|\/\*((.|\n)(?!=*\/))*\*\/ it will show you graph for that.

Sunil Kanzar
  • 1,244
  • 1
  • 9
  • 21