-2

I am working on a homework assignment for which I need to read a java source file and remove all the comments from it. The rest of the styling should stay the same.

I have already completed the task by using regex.

But I am looking to accomplish the same without using regex.

Example Input

// My first single line comment

class Student {

  /* Student class - Describes the properties
  of the student like id, name */

  int studentId; // Unique Student id
  String studentName; // Name of the student
  String junk = "Hello//hello/*hey";

} // End of student class

Result

class Student {

  int studentId;
  String studentName;
  String junk = "Hello//hello/*hey";

}

My idea is to read each line

1) Check the first two characters

  • if it starts with // ==> remove the line

  • if it starts with /* ==> remove all the lines till */

2) Another scenario is handling

Example - int studentId; // Comment or /* Comment */

Can someone provide a better approach to this ?

user3451476
  • 297
  • 1
  • 4
  • 17
  • Instead of checking the first two characters, search for `//`, then remove from that point to the new line. Same idea for `/*` except remove until `*/` – Hypnic Jerk Feb 16 '17 at 21:09
  • @HypnicJerk Thank you for the response. If i search the entire line of text, and if I have a String declaration like - String junk = "Hello//hello"; This could cause problems right – user3451476 Feb 16 '17 at 21:13
  • @user3451476 Could you provide a String example of how it looks like and how you want it to look like – Nick Div Feb 16 '17 at 21:19
  • @NickDiv I have updated the question – user3451476 Feb 16 '17 at 21:31
  • 1
    Obviously, your idea doesn’t work, as you are not considering string literals at all. Keep in mind that within string literals, characters can be escaped via backslash. Oh, and what about `\uxxxx` sequences? Are you supposed to handle these correctly as well? I have my doubts that you really have “completed the task by using regex”. But if you had and understood you regex based solution, what’s so hard implementing exactly the same logic manually? – Holger Feb 17 '17 at 17:19
  • dont forget unicode http://stackoverflow.com/q/8115522/7098259 – Patrick Parker Feb 17 '17 at 17:36

1 Answers1

6

If you want to try something other than a regex, then one possibility is a state machine. There are at least five states: Start, In normal code, In // comment, In /* ... */ comment and Stop.

Start in the Start state. Each state processes the input until it comes to something that makes it switch to a different state. The Stop state ends processing, doing any necessary tidying up such as closing files.

Remember that you will need to handle badly formed input, and also sneaky input:

System.out.println("A Java comment may start with /* and finish with */");

I will leave it to you to work out how to handle that.

rossum
  • 15,344
  • 1
  • 24
  • 38