3

I am trying to create a regular expression (Java/JavaScript) that matches the following regex, but only when there are fewer than 13 characters total (and a minimum of 4).

(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)   ← originally posted

(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ [A-Z]+)?)

These values should (and do) match:

MED-123
COTA-1224
MED4
COTB-892K777
MED-33 DDD
MED-234J5678

This value matches, but I don't want it to (I want to only match if there are fewer than 12 characters total):

COT-1111J11111111111111

See http://regexr.com/3bs7b http://regexr.com/3bsfv

I have tried grouping my expression and putting {4,12} at the end, but that just makes it look for 4 to 12 instances of the whole expression matching.

I feel like I am missing something simple...thanks in advance for your help!

Kabb5
  • 3,760
  • 2
  • 33
  • 55
  • 1
    Try `(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]{1,4})|( DDD)?)\b`. See [demo](https://regex101.com/r/qK2gA2/1). – Wiktor Stribiżew Sep 25 '15 at 19:09
  • 1
    Also, you may try [`(?!\S{13})(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)`](https://regex101.com/r/eU2cU4/1). Does either work for you? – Wiktor Stribiżew Sep 25 '15 at 19:23
  • 1
    This may be a case where trying to shove too much into the regex isn't worth it. Why not just do the regex, and then if you get a match, take a second step to check the length of the matched sequence (`matcher.group(0).length() < 12` in Java, and similarly in js)? – yshavit Sep 25 '15 at 19:29
  • @stribizhev your second comment (negative look-ahead) works as desired - just like the accepted answer - you should have made this an answer instead of a comment! – Kabb5 Sep 28 '15 at 13:49
  • @yshavit unfortunately, all in one regex was the only option available to me, otherwise, as you suggested, it would certainly be much easier to just check the length of the match – Kabb5 Sep 28 '15 at 13:50
  • You really do not need to check for more than 13 characters, 13 is enough. Use `(?!.{13})` instead of `(?!.{13,})`. – Wiktor Stribiżew Sep 28 '15 at 13:57

4 Answers4

2

You can use negative look-ahead:

(?!.{13,})(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)

Since your expression already make sure that a match starts with COT or MED and there is at least one digit after that, it already guarantees that there are at least 4 characters

Djizeus
  • 4,161
  • 1
  • 24
  • 42
  • This did it for me - thank you! Here is this regex working with my test strings: http://regexr.com/3bsg2 – Kabb5 Sep 28 '15 at 13:45
2

I have tried grouping my expression and putting {4,12} at the end, but that just makes it look for 4 to 12 instances of the whole expression matching.

This looks for 4 to 12 instances of the whole expression because you didn't add a word boundary \b. Your regex works fine, just add a word boundary and your desired outcome would be achieved. Take a look at this DEMO.

Your regex seems to be very clumsy and looks a little bit hard to read. It is also very limited to certain characters example JK except if you want it to be that way. For a more general pattern, you can check this out

(COT|MED)[AB]?-?[\dJK]{1,8}(\s+D{1,3})?\b

(COT|MED): matches either COT or MED

[AB]?: matches A or B which is optional because of the presence of ?

-?: matches - which is also optional

[\dJK]{1,8}: This matches a number,or J or K with a length of at least one character and a maximum of eight characters.

(\s+D{1,3})?: matches a space or a D at least one time and a maximum of 3 times and this is optional

\b: with respect to your question this seems to be the most important and it creates a boundary for the words that have already been matched. This means that anything exceeding the matched pattern would not be captured.

See the demo here DEMO2

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
james jelo4kul
  • 839
  • 4
  • 17
  • Thank you for the detailed answer, I was about to accept, but it's not quite the solution. Looking at the last test string of your first demo link, the value has been changed from my initial example. Delete the `\b` at the end of the regex there is a partial match on the line, whereas my test string on that line was a complete match, so the `\b` is reacting to that partial match. Regarding the regex, yes it difficult to read and very limited, but, unfortunately, that is the case I am dealing with. I simplified a bit and should have written `[A-Z]+` instead of `DDD` - which rules out `D{1,3}` – Kabb5 Sep 28 '15 at 13:37
  • Check very well, it is your pattern that is in the link with the exception of `\b'. it matches a part of the last string. Unless you didn't post the pattern very well. For more confirmation, try testing your pattern to test it in another regex engine and it would yield same result – james jelo4kul Sep 28 '15 at 14:42
  • Test 'COT-1111J11111111111111' against your regex...it matches, but it should not in order to satisfy my question – Kabb5 Sep 28 '15 at 17:06
  • i just did it now, but it didn't match. test it with demo2 – james jelo4kul Sep 28 '15 at 20:00
  • ok, demo2 works, but it assumes {1,8} and {1,3} in the regex, which is not true to my situation. thanks – Kabb5 Sep 29 '15 at 12:29
1

The answer you are looking for is

(?!\S{13})(?:COT|MED)[ABCD]?-?\d{1,4}(?:[JK]+\d*|(?: [A-Z]+)?)

See regex demo

Note that it is almost impossible to check the length of a phrase that is not a whole string or that has spaces inside since boundaries are a bit "blurred". Thus, (?!\S{13}) is a kind of a workaround that just makes sure you do not have a string without whitespace that is 13 characters long or longer.

The regex breakdown:

  • (?!\S{13}) - Check if the substring that follows does not consist of 13 non-whitespace characters
  • (?:COT|MED) - Any of the values in the alternation (COTorMED`)
  • [ABCD]?-? - Optional A, B, C, D and then an optional -
  • \d{1,4} - 1 to 4 digits
  • (?:[JK]+\d*|(?: [A-Z]+)?) - a group of 2 alternatives:
    • [JK]+\d* - J or K, 1 or more times, and then 0 or more digits
    • (?: [A-Z]+)? - optional space and 1 or more Latin uppercase letters
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

As this answer suggests, you could solve this this way:

(?=(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?))(?={4 , 12})
Community
  • 1
  • 1
  • That works in many server-side regex implementations but I don't think it's supported in Javascript RegExp, which is one of the keywords given. – Paul Kienitz Sep 25 '15 at 19:12
  • @PaulKienitz can't confirm or refute this, since i can't access the javascript docs atm. I'll update as soon as i've got access –  Sep 25 '15 at 19:28