-2

I've only dabbled in regular expressions and was wondering if someone could help me make a Java regex, which matches a string with these qualities:

  1. It is 1-14 characters long
  2. It consists only of A-Z, a-z and the letters _ or -
  3. The symbol - and _ must be contained only once (together) and not at the start

It should match

  • Hello-Again
  • ThisIsValid
  • AlsoThis_

but not

  • -notvalid
  • Not-Allowed-This
  • Nor-This_thing
  • VeryVeryLongStringIndeed

I've tried the following regex string

[a-zA-Z^\\-_]+[\\-_]?[a-zA-Z^\\-_]*

and it seems to work. However, I'm not sure how to do the total character limiting part with this approach. I've also tried

[[a-zA-Z]+[\\-_]?[a-zA-Z]*]{1,14}

but it matches (for example) abc-cde_aa which it shouldn't.

bombax
  • 1,189
  • 8
  • 26
  • 8
    This question appears to be off-topic because it does not display any research or effort albeit stating the desired requisites. – Mena Jun 10 '14 at 20:06
  • Must've caught you all on a bad day, when questions like these are answered: http://stackoverflow.com/questions/6078259/regular-expression-to-limit-all-letters-less-than-100-characters – bombax Jun 10 '14 at 20:39
  • 1
    I have put it in the reopen queue following your edits - should be back shortly. I even have a nifty answer for you... – Boris the Spider Jun 10 '14 at 20:43
  • 2
    @bombax thank you for pointing out that question. Just voted to close it too. It's not a bad day or any animosity towards you. It's a question of interpretation of what is an acceptable question, as well as a debate over the tools to prevent bad questions to survive. As such closing this specific question is up for debate and I plaud Boris the Spider's effort to opine otherwise. – Mena Jun 10 '14 at 20:49
  • 1
    @Mena I totally agree that this question should have been closed when it was "give me code". Following the addition of a couple of examples that the OP has attempted, I think this becomes a perfectly good question. – Boris the Spider Jun 10 '14 at 20:51
  • @BoristheSpider oh wow. Just reloaded the page... I had not seen the examples! Voting to reopen. **Edit** must be really slow today, it's already been reopened in the meantime :D – Mena Jun 10 '14 at 21:01

2 Answers2

5

This ought to work:

(?![_-])(?!(?:.*[_-]){2,})[A-Za-z_-]{1,14}

The regex is quite complex, let my try and explain it.

  • (?![_-]) negative lookahead. From the start of the string assert that the first character is not _ or -. The negative lookahead "peeks" of the current position and checks that it doesn't match [_-] which is a character group containing _ and -.
  • (?!(?:.*[_-]){2,}) another negative lookahead, this time matching (?:.*[_-]){2,} which is a non capturing group repeated at least two times. The group is .*[_-], it is any character followed by the same group as before. So we don't want to see some characters followed by _ or - more than once.
  • [A-Za-z_-]{1,14} is the simple bit. It just says the characters in the group [A-Za-z_-] between 1 and 14 times.

The second part of the pattern is the most tricky, but is a very common trick. If you want to see a character A repeated at some point in the pattern at least X times you want to see the pattern .*A at least X times because you must have

zzzzAzzzzAzzzzA....

You don't care what else is there. So what you arrive at is (.*A){X,}. Now, you don't need to capture the group - this just slows down the engine. So we make the group non-capturing - (?:.*A){X,}.

What you have is that you only want to see the pattern once, so you want not to find the pattern repeated two or more times. Hence it slots into a negative lookahead.

Here is a testcase:

public static void main(String[] args) {
    final String pattern = "(?![_-])(?!(?:.*[_-]){2,})[A-Za-z_-]{1,14}";
    final String[] tests = {
            "Hello-Again",
            "ThisIsValid",
            "AlsoThis_",
            "_NotThis_",
            "-notvalid",
            "Not-Allow-This",
            "Nor-This_thing",
            "VeryVeryLongStringIndeed",
    };
    for (final String test : tests) {
        System.out.println(test.matches(pattern));
    }
}

Output:

true
true
true
false
false
false
false
false

Things to note:

  1. the character - is special inside character groups. It must go at the start or end of a group otherwise it specifies a range
  2. lookaround is tricky and often counter-intuitive. It will check for matches without consuming, allowing you to test multiple conditions on the same data.
  3. the repetition quantifier {} is very useful. It has 3 states. {X} is repeated exactly X times. {X,} is repeated at least X times. And {X, Y} is repeated between X and Y times.
Community
  • 1
  • 1
Boris the Spider
  • 59,842
  • 6
  • 106
  • 166
  • Perfect, I had the feeling you had to do something complex and indeed negative lookahead was something I've never seen before. Thanks a lot! – bombax Jun 10 '14 at 21:13
3

To check if string is in form XXX-XXX where -XXX or _XXX part is optional you can use

[a-zA-Z]+([-_][a-zA-Z]*)?

which is similar to what you already had

[[a-zA-Z]+[\\-_]?[a-zA-Z]*]

but you made crucial mistake and wrapped it entirely in [...] which makes it character class, and that is not what you wanted.

To check if matched part has only 1-14 length you can use look-ahead mechanism. Just place

(?=.{1,14}$)

at start of your regex to make sure that part from start of match till end of it (represented by $) contains of any 1-14 characters.

So your final regex can look like

String regex = "(?=.{1,14}$)[a-zA-Z]+([-_][a-zA-Z]*)?";

Demo

String [] data = {
    "Hello-Again",
    "ThisIsValid",
    "AlsoThis_",

    "-notvalid",
    "Not-Allowed-This",
    "Nor-This_thing",
    "VeryVeryLongStringIndeed",
};

for (String s : data)
    System.out.println(s + " : " + s.matches(regex));

Output:

Hello-Again : true
ThisIsValid : true
AlsoThis_ : true
-notvalid : false
Not-Allowed-This : false
Nor-This_thing : false
VeryVeryLongStringIndeed : false
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • Thanks, helpful in understanding the look-ahead mechanism more. – bombax Jun 10 '14 at 21:18
  • 1
    @bombax You can also find few nice informations in [this answer](http://stackoverflow.com/questions/3802192/regexp-java-for-password-validation/3802238#3802238) of kind of similar question. – Pshemo Jun 10 '14 at 21:25