0

I am currently developing a corrector for java in my text editor. To do so I think the best way is to use Pattern to look for element of java syntax (import or package declaration, class or method declaration...). I have already written some of these pattern:

private String regimport = "^import(\\s+)(static |)(\\w+\\.)*(\\w+)(\\s*);(\\s*)$",
                regpackage="^package(\\s+)[\\w+\\.]*[\\w+](\\s*);(\\s*)$",
                regclass="^((public(\\s+)abstract)|(abstract)|(public)|(final)|(public(\\s+)final)|)(\\s+)class(\\s+)(\\w+)(((\\s+)(extends|implements)(\\s+)(\\w+))|)(\\s*)(\\{)?(\\s*)$";

It's not very difficult for now but I am afraid it will take a long time to achieve it. Does someone know if something similar already exists?

joub
  • 117
  • 10
  • 1
    I highly doubt that Java syntax can be fully expressed in a regex. – SLaks Oct 03 '12 at 17:19
  • I feel your approach of building java corrector is wrong , you need not parse the language , you have to tokenize it first and check for its syntax , That will make your life easy – Aravind.HU Oct 03 '12 at 17:24
  • Java is not a regular language (although Java regex isn't regular either), so you are in dire straits right there. But I believe there are Java parsers written in Java out there. – NullUserException Oct 03 '12 at 17:24

3 Answers3

2

To do so I think the best way is to use Pattern to look for element of java syntax

Incorrect. Regular Expression patterns cannot adequately identify Java syntax elements. That is why the much more complex parsers exist. For a simple example, just imagine how you would you avoid the false match for a reserved word inside a comment, such as following

/* this is not importing anything
import java.util.*;
*/

But if you are very keen to use regular expressions, and willing to spend lot of effort, look at Emacs font-lock-mode, which uses regular expressions to identify and fontify syntax elements.

PS: The "lot of effort" I mention refers to learning how Emacs works, reading elisp code and translating Emacs regexp to Java. if you already know all that then you will need less effort.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
Miserable Variable
  • 28,432
  • 15
  • 72
  • 133
1

Thank you all for your answers. I think I'm going to work with javaparser AST, it will be a lot easier :)

Here is a code to check for error with AST

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.eclipse.jdt.core.compiler.IProblem;
import org.eclipse.jdt.core.dom.AST;
import org.eclipse.jdt.core.dom.ASTParser;
import org.eclipse.jdt.core.dom.CompilationUnit;

public class Main {

    public static void main(String[] args) {

        ASTParser parser = ASTParser.newParser(AST.JLS2);
        FileInputStream in=null;
        try {
            in = new FileInputStream("/root/java/Animbis.java"); //your personal java source file
            int n;
            String text="";
            while( (n=in.read()) !=-1) {
                text+=(char)n;
            }
            CompilationUnit cu;
            // parse the file
            parser.setSource(text.toCharArray());
            in.close();
        }catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        CompilationUnit unit = (CompilationUnit) parser.createAST(null); 
        //unit.recordModifications();
        AST ast = unit.getAST(); 


        IProblem[] problems = unit.getProblems();
        boolean error = false;
        for (IProblem problem : problems) {
           StringBuffer buffer = new StringBuffer();
           buffer.append(problem.getMessage());
           buffer.append(" line: ");
           buffer.append(problem.getSourceLineNumber());
           String msg = buffer.toString(); 
           if(problem.isError()) {
              error = true; 
              msg = "Error:\n" + msg;
           }    
           else 
              if(problem.isWarning())
                 msg = "Warning:\n" + msg;

           System.out.println(msg);  
        }

    }


}

To run with the following jar:

org.eclipse.core.contenttype.jar
org.eclipse.core.jobs.jar
org.eclipse.core.resources.jar
org.eclipse.core.runtime.jar
org.eclipse.equinox.common.jar
org.eclipse.equinox.preferences.jar
org.eclipse.jdt.core.jar
org.eclipse.osgi.jar

Got infos from Eclipse ASTParser and Example of ASTParser

joub
  • 117
  • 10
0

Java's complete syntax cannot be parsed by RegEx. They are different classes of language. Java is at least a Chomsky type 2 language, whereas RegEx is type 3, and type 2 is fundamentally more complex than type 3. See also this famous answer about parsing HTML with RegEx... it's essentially the same problem.

Community
  • 1
  • 1
Joe K
  • 18,204
  • 2
  • 36
  • 58
  • Java is not type 2 since it's definitely *not* context-free. But then again, modern regex engines (Java's included) are not regular either. And that "famous answer" has become such a tired meme... If you know what you're doing, you can go far with regex: http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326 – NullUserException Oct 03 '12 at 18:07
  • True - I edited "probably" to "at least". And regardless, RegEx is definitely NOT the way to do this. – Joe K Oct 03 '12 at 18:11