2

i want to get class name from java file .for example

class Mango {

now i want to get mango as class name.

this is the regex i used

\s*class\s+(\S+)

it works and i captured the class name .the problem is if clasname has no spaces between classname and closing curly bracket i get name as mango{

like below one.

class Mango{

so i want to exclude { from group.so i modyfy to following

\s*class\s+(\S+|[^{])

but it doesn't work and still capture classname with closing bracket. how can i get only class name .

enter image description here

while true
  • 816
  • 1
  • 11
  • 26

3 Answers3

4

try with this regex:

class\s+([\w$]+)

\w - is a word character (a-zA-Z0-9_)

This regex will capture only allowed characters for class names. However if we assume that coder used valid characters, you can also try:

(?<=class\s)\s*(.+?)(?=\s*\{)

DEMO

which is:

  • (?<=class\s) - positive lookbehind for word class and whitespace,
  • \s*(.+?) - zero or more whitespaces, and a one or more character,
  • (?=\s*\{) - positive lookahead for whitspace(s) and curly bracket {

to directly get class name. This regex allow all characters, however it could be useful if coder would use one of rarely used characters allowed in Java names.

Community
  • 1
  • 1
m.cekiera
  • 5,365
  • 5
  • 21
  • 35
  • Even better would be a regex which matches the Java identifier rules, IIRC: `class\s+([a-zA-Z_]\w*)` (identifiers start with an alphabetic or underscore character and follow with zero or more alphanumerics (or underscore). (May better better using Unicode character classes: I cannot recall if Java identifiers can use characters beyond ASCII). – Richard Aug 03 '15 at 13:00
  • Problem with `\w` is that it will not match all possible characters which can be used in class name like `$`. – Pshemo Aug 03 '15 at 13:00
  • @m.cekiera thanks but `(?<=class\s)\s*(.+)(?=\s+\{)` doesn't capture mango from `class mango{` – while true Aug 03 '15 at 13:09
  • @whiletrue sorry, my obvius mistake, I assumed that there would be always space after name, I updated answer – m.cekiera Aug 03 '15 at 13:12
3

To accept only proper characters which can be used as names of classes or variables we may want to use method Character.isJavaIdentifierPart, which can be referred via \p{javaJavaIdentifierPart} as explained in Pattern class documentation (emphasis mine):

Categories that behave like the java.lang.Character boolean isMethodName methods (except for the deprecated ones) are available through the same \p{property} syntax where the specified property has the name javaMethodName.

Demo:

String text = "class Mango{";

Pattern p = Pattern.compile("\\s*class\\s+(\\p{javaJavaIdentifierPart}+)");
Matcher m = p.matcher(text);

if (m.find()){
    System.out.println(m.group(1));
}else{
    System.out.println("no match found");
}

Output: Mango


If you are not going to use this regex in Java engine, then you can use [^\s{] instead of \S like

\s*class\s+([^\s{]+)

It will accept any character except whitespaces and {.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
2

Not sure what the question is: do you want an explanation for (a) why your approach does not work or would you like to know (b) what a correct regex for this problem looks like?

If it's the latter, other answers and comments have provided some correct expressions. If it's the former, then consider what

(\S+|[^{])

actually matches. What this basically says is: match everything that is not a whitespace or is not a left curly bracket. Note the "or" in that sentence.

The reason your expression still matches the { after "Mango" is that it satisfies the first part of the sentence: a { is not a whitespace, so your regex matches. The same argument applies to the space between "Mango" and the {: it satisfies the "not a left curly bracket" part, and thus is matched as well.

What you want is an expression that encodes: match everything that is not a whitespace and is not a left curly bracket. As mentioned above, other answers/comments to this question show examples on how to achieve this.

Thomas
  • 17,016
  • 4
  • 46
  • 70