4

I tried using regular expression to filter the single and multi-line comments from my text file. I am able to filter all the comments like

//it works
/*
* welcome
*/
/* hello*/

but I am not able to remove the following comment

/*
sample
*/

This is my code:

import java.io.*;
import java.lang.*;


class TestProg
{
public static void main(String[] args) throws IOException {
    removeComment();
}
static void removeComment() throws IOException
{
    try {
        BufferedReader br = new BufferedReader(new FileReader("d:\\data.txt"));
        String line;
        while((line = br.readLine()) != null){
            if(line.contains("/*") && line.contains("*/") || line.contains("//")) {

                System.out.println(line.replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","")); 
            }
            else if(line.contains("/*") || line.contains("*") || line.contains("*/")) {

                continue;
            }
            else
                System.out.println(line); 
        }
        br.close();
    }

    catch(IOException e) {
        System.out.println("OOPS! File could not read!");
    }
}
}

Please help me to solve this...

Thanks in advance.

Thomas
  • 87,414
  • 12
  • 119
  • 157

4 Answers4

3

Using the javaparser you could solve it like shown in this PoC.

RemoveAllComments

import japa.parser.JavaParser;
import japa.parser.ParseException;
import japa.parser.ast.CompilationUnit;
import japa.parser.ast.Node;
import java.io.File;
import java.io.IOException;

public class RemoveAllComments {

    static void removeComments(Node node) {
        for (Node child : node.getChildrenNodes()) {
            child.setComment(null);
            removeComments(child);
        }
    }

    public static void main(String[] args) throws ParseException, IOException {
        File sourceFile = new File("Test.java");
        CompilationUnit cu = JavaParser.parse(sourceFile);
        removeComments(cu);
        System.out.println(cu.toString());
    }
}

TestClass.java used as an example input source

/**
 * javadoc comment
 */
class TestClass {

    /*
     * block comment
     */
    static class Cafebabe {
    }

    // line comment
    static interface Commentable {
    }

    public static void main(String[] args) {
    }
}

output to stdout (to store it in a file is up to you)

class TestClass {

    static class Cafebabe {
    }

    static interface Commentable {
    }

    public static void main(String[] args) {
    }
}
SubOptimal
  • 22,518
  • 3
  • 53
  • 69
  • 1
    I strongly suggest using a proper parser because there are all sort of issues when using reg exp for parsing code. And JavaParser is a great choice! – Federico Tomassetti Aug 14 '15 at 14:59
  • 2
    I think the code above will leave orphan comments. Remove Comments should be `static void removeComments(Node node) { for (Comment child : node.getAllContainedComments()) { child.remove(); } }` (I'm using JavaParser 3.4.4.) – Loren Oct 19 '17 at 21:06
0

Try this code

import java.io.*;
import java.lang.*;

class Test {

 public static void main(String[] args) throws IOException {
 removeComment();
 }

 static void removeComment() throws IOException {
  try {
      BufferedReader br = new BufferedReader(new FileReader("d:\\fmt.txt"));
      String line;
      boolean comment = false;
      while ((line = br.readLine()) != null) {
      if (line.contains("/*")) {
          comment = true;
          continue;
      }
      if(line.contains("*/")){
          comment = false;
          continue;
      }
      if(line.contains("//")){
          continue;
      }
      if(!comment){
      System.out.println(line);
      }
    }
    br.close();
 }

 catch (IOException e) {
    System.out.println("OOPS! File could not read!");
  }
 }
}

I have given below code as input :

package test;
public class ClassA extends SuperClass {
 /**
 * 
 */
    public void setter(){
    super.set(10);
    }
  /*  public void printer(){
    super.print();
    }
*/    
    public static void main(String[] args) {
//  System.out.println("hi");
    }    
}

My output is :

package test;
public class ClassA extends SuperClass {
    public void setter(){
    super.set(10);
    }
    public static void main(String[] args) {
    }    
}
Keval
  • 1,857
  • 16
  • 26
  • What will happen with a multi-line comment ending like `*/ String a = "hey! I'm gonna be deleted too!";`, or starting like `String a = "hmm..."; /*` ? – BackSlash Aug 05 '15 at 10:03
  • in op sample input comments that you are saying is not specified, and for user what is not working will be solved from above code, btw for your point i would say input code needs to be formatted, else there will lot of stuff to be taken care – Keval Aug 05 '15 at 10:11
  • @BackSlash check out my answer. One regex solves all problem: `([\t ]*\/\*(?:.|\R)*?\*\/[\t ]*\R?)|(\/\/.*)` – Harsh Poddar Aug 05 '15 at 11:17
  • 1
    Just a side note: please don't just dump code but add an explanation as well. Those will help the OP and others in understanding the problem and its solution better - and sometimes you'll see flaws/limitations in a solution while writing the explanation and you can point those out. – Thomas Aug 05 '15 at 12:04
-1

Since you read each line individually you can't apply a single regex to it. Instead you'd have to look for single line comments ( //.*) as well as start and end of multiline comments (/\*.* and .*\*/). If you find a multiline comment start then keep account of that and handle everything as a comment until you encounter the end match.

Example:

boolean inComment = false;
while((line = br.readLine()) != null){
  //single line comment, remove everything after the first //
  if( line.contains("//") ) {
     System.out.println(line.replaceAll("//.*","")); 
  } 
  //start of multiline, remove everthing after the first /*
  else if( line.contains("/*") ) { 
    System.out.println(line.replaceAll("/\*.*","")); 
    inComment = true;
  }
  //end of multiline, remove everthing until the first */
  else if( line.contains("*/") {
    //note the reluctant quantifier *? which is necessary to match as little as possible 
    //(otherwise .* would match */ as well)
    System.out.println(line.replaceFirst(".*?\*/","")); 
    inComment = true;
  }
  //inside a multiline comment, ignore the entire line
  else if( inComment ) {
    continue;
  }

Edit: an important addition

In your question you're talking about text files which normally have a regular structure and thus you can apply my answer.

But, as you stated in the title, if the files contain Java code then you have a irregular problem domain, i.e. Java code. In that case you can't safely apply regex and should better use a Java parser.

For more information have a look here: RegEx match open tags except XHTML self-contained tags Although this is about applying regex to HTML the same is true for applying regex on Java since both are irregular problem domains.

Community
  • 1
  • 1
Thomas
  • 87,414
  • 12
  • 119
  • 157
  • @Poddar thanks for your code. Your regex removes comment from System.out.println("Hi//this is a test"); But the regex should not parse the comment in double quotes o single quotes – Ramya Rajasekar Aug 05 '15 at 11:28
  • @RamyaRajasekar is this the wrong place for your comment or did you just use the wrong user reference? – Thomas Aug 05 '15 at 11:54
-1

Try out the following code:

// Read the entire file into a string 
BufferedReader br = new BufferedReader(new FileReader("filename"));
StringBuilder builder = new StringBuilder();
int c;
while((c = br.read()) != -1){
    builder.append((char) c);
}
String fileData = builder.toString();


// Remove comments
String fileWithoutComments = fileData.replaceAll("([\\t ]*\\/\\*(?:.|\\R)*?\\*\\/[\\t ]*\\R?)|(\\/\\/.*)", "");
System.out.println(fileWithoutComments);

It first reads the entire file into a string and then removes all comments from it. The explaination of the regex could be found here: https://regex101.com/r/vK6lC4/3

Harsh Poddar
  • 2,394
  • 18
  • 17
  • @RamyaRajasekar can you please elaborate? I have checked it again and it runs just fine. – Harsh Poddar Aug 05 '15 at 12:54
  • @Pddar i am using commang prompt. I got the following exception: Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 17 ([\t ]*\/\*(?:.|\R)*?\*\/[\t ]*\R?)|(\/\/.*) ^ – Ramya Rajasekar Aug 06 '15 at 09:54
  • @RamyaRajasekar, you need to use the following regex instead `([\\t ]*\\/\\*(?:.|\\R)*?\\*\\/[\\t ]*\\R?)|(\\/\\/.*)`. If you copied it from the regex101 website, it won't work as you need to escape the \ – Harsh Poddar Aug 06 '15 at 09:56