3

I have tried looking over quite a lot of regex guides on how to use the negative lookbehind and lookahead in my regex function for html5 pattern.

I am trying to match the following pattern, where the string has to start and end with a [a-z] letter. The string can be up to 30 characters long. Can also include the symbol: -, however it can never have more than one - in a row.

Sooo, what I came up with so far is this:

^[a-z][a-z(?<!-(-)?!-)]{0,28}[a-z]$

Now I could not get the lookahead and lookbehind so work properly and I am not quite sure if I implemented the max 30 characters correctly. However, I have tried starting and ending with [a-z] and it works fine.

Some example strings:

'a-b' => true
'a-' => false
'-a' => false
'a--b' => false
'ab-cd' => true
'abc' => true
'a-b-c' => true
Gjert
  • 1,069
  • 1
  • 18
  • 48
  • Try `^(?!.{31})[a-z]+(?:-[a-z]+)*$`. It is not clear if you only want to match strings like `abc-xyz-def` and `aaaaaa`, or if `a-()*^a` are also valid. Please add some examples of strings you allow and some you do not allow. – Wiktor Stribiżew Jul 22 '17 at 20:14
  • @WiktorStribiżew Sweet, seemes like its working, could you explain the structure you used? It seemes so off from what I used. – Gjert Jul 22 '17 at 20:17
  • @WiktorStribiżew See update for some examples of allowed strings. – Gjert Jul 22 '17 at 20:18
  • 1
    I added an answer with a demo and explanation. – Wiktor Stribiżew Jul 22 '17 at 20:24

2 Answers2

3

You need to use

^(?!.{31})[a-z]+(?:-[a-z]+)*$

See the regex demo

Note that in an HTML5 pattern attribute, the anchors are usually not required as the pattern is anchored on both sides by default.

Details

  • ^ - start of string
  • (?!.{31}) - there cannot be 31 chars other than line break chars (this (?!...) is a negative lookahead that fails the match if its pattern is matched) (you may also use a positive lookahead - (?=.{1,30}$) - that requires 1 to 30 chars in the string)
  • [a-z]+ - 1 or more lowercase ASCII letters
  • (?:-[a-z]+)* - zero or more sequences of:
    • - - a hyphen
    • [a-z]+ - 1 or more lowercase ASCII letters
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Do you have time to explain the function of `?:` in the zero or more sequences of section? – Gjert Jul 22 '17 at 20:29
  • See [What is a non-capturing group? What does a question mark followed by a colon (?:) mean?](https://stackoverflow.com/questions/3512471) It is used for grouping purposes when we do not need to access the submatch value. E.g. Casimir's pattern can be written as `^(?:[a-z]|\b-\b){1,30}$`. – Wiktor Stribiżew Jul 22 '17 at 20:31
  • Thanks, I'll take a look on the link you provided :) – Gjert Jul 22 '17 at 20:32
  • You may also [see how the HTML5 regex patterns are handled](https://www.w3.org/TR/html5/single-page.html#the-pattern-attribute): *the `pattern` attribute is matched against the entire value, not just any subset (somewhat as if it implied a `^(?:` at the start of the pattern and a `)$` at the end)* – Wiktor Stribiżew Jul 22 '17 at 20:41
3

You can use this pattern:

([a-z]|\b-\b){1,30}

The word boundaries prevent hyphens to be consecutive or to be at the string limits.

Note that ^ and $ are not needed in the pattern attribute since they are implicit.

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • There are frameworks that override the regular HTML5 pattern behavior, so it might be a good idea to keep the anchors where they should be. – Wiktor Stribiżew Jul 22 '17 at 20:27
  • 1
    @WiktorStribiżew: the question is about html5, not about any framework. If a framework isn't able to anchor the pattern itself, it should not be used. – Casimir et Hippolyte Jul 22 '17 at 20:27
  • I was not aware that HTML5 included these anchors, what are the reason for NOT including these in every case? – Gjert Jul 22 '17 at 20:31
  • @PhyCoMath: As I said before the anchors are implicit. The reason is probably that the pattern is supposed to describe a whole string and not only a part of this string. Whatever, even if you only want a condition for a part of the string, you can build a pattern describing a whole string with this condition; but you can't always build a simple condition to describe a whole string. (perhaps this fact explains this choice). – Casimir et Hippolyte Jul 22 '17 at 20:33
  • This is a _5 star_ answer. +1 –  Jul 22 '17 at 23:49