1

Suppose that *a* is a Java identifier. I would like a regex to match things like this:

\#a \#a.a.a (a any number of times)

but not this:

\#a. (ending with dot)

So in a phase like this: "#a.a less than #a." it would match only the first \#a.a (because it doesn't end with a dot).

This regex:

\#[a-zA-Z_$][\\w$]*(\\.[a-zA-Z_$][\\w$]*)*

almost does the job, but it matches the last case too.

Thank you.

Marcos

cathulhu
  • 641
  • 1
  • 9
  • 20
Marcos
  • 1,237
  • 1
  • 15
  • 31
  • 1
    Possible duplicate: http://stackoverflow.com/questions/5205339/regular-expression-matching-fully-qualified-class-names – Lii May 02 '16 at 10:39
  • Although most Java identifiers use Ascii, all UTF-8 characters are allowed, so it's better to use \p{L} instead of a-zA-Z. – hd42 Jul 09 '21 at 06:53

2 Answers2

2

This can be accomplished with a negated look ahead. This first looks for "#text_$". It then looks for ".text_$" or more times. The match will be invalid if it ends with 0 or more of "text_$" and a period. This is assuming the i modifier is on.

At first I just had it as checking if it didn't end with a period, but that would just take away the last character in the match.

\\#([a-z_$][a-z_$\d]*)(\.[a-z_$][a-z_$\d]*)*(?![a-z_$\d]*\.)

Results

\#abc           => YES
\#abc.abc       => YES
\#abc.a23.abc   => YES
\#abc.abc.abc.  => NO
\#abc.2bc.abc   => NO

Try it out

Daniel Gimenez
  • 18,530
  • 3
  • 50
  • 70
  • @Marcos: added digits. The accepted answer also did not work for digits. – Daniel Gimenez Jul 18 '13 at 12:50
  • The complete regex that I'm using is this: (?i)#[a-zA-Z_$][\\w$]*(?:\\.[a-zA-Z_$][\\w$]*)*(?!\\w*\\.) So it works with digits. – Marcos Jul 18 '13 at 12:54
  • @DanielGimenez: My answer definitely works with digits, you can try yourself. – anubhava Jul 18 '13 at 12:55
  • @anubhava you're right. I suppose our answers are redudant because at the end I reached the same answer you had without the `\w`. I will delete in a few after I know you read this comment. – Daniel Gimenez Jul 18 '13 at 13:01
  • @DanielGimenez: Yes at this point I would think both answers look same (after you start using `\w`). But I would say just leave it like this, why delete. – anubhava Jul 18 '13 at 13:16
2

You almost got it right but some minor adjustments are needed. Consider this regex:

#[A-Za-z_$][\w$]*(?:\.[A-Za-z_$][\w$]*)*(?!\w*\.)

Live Demo: http://www.rubular.com/r/kJbSJKHhtv

Translated to Java:

(?i)#[a-z_$][\\w$]*(?:\\.[a-z_$][\\w$]*)*(?!\\w*\\.)
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • With this statement "#a.a less than #var and #a.aaa." it still matches #a.aa I would like it to ignore the last #a.aaa. completely. – Marcos Jul 18 '13 at 12:37