Is there a better way to write a regex that does not match on leading and trailing spaces along with a character limit?

Question

The regex I have is...

^[A-z0-9]*[A-z0-9\s]{0,20}[A-z0-9]*$

The ultimate goal of this regex is not to allow leading and trailing spaces, while limiting the characters that are entered to 20, which the above regex doesn't do a good job at.

I found a some questions similar to this and the closest one to this would be How to validate a user name with regex?, but it did not limit the number of chars. This did solve the problem of leading and trailing spaces.

I also saw a way using negation and another negative lookahead, but that didn't work out so well for me.

Is there a better way to write the regex above with the 20 character limit? The repeat of the allowed characters is pretty ugly especially when the list of the allowed characters are large and specific.

Is this case sensitive? `[A-z]` is not the same as `[A-Za-z]`. — OnlineCop, Jun 03 '14 at 21:39
To illustrate @OnlineCop's comment: [Difference between regex A-z and a-zA-Z](http://stackoverflow.com/questions/4923380/difference-between-regex-a-z-and-a-za-z) — Robin, Jun 03 '14 at 21:43

score 2 · Answer 1 · answered Jun 03 '14 at 21:37

2

Hmm, if you need to exclude the single character text, I would go with:

^[A-z0-9][A-z0-9\s]{0,18}[A-z0-9]$

If a single character is also acceptable:

^[A-z0-9](?:[A-z0-9\s]{0,18}[A-z0-9])?$

answered Jun 03 '14 at 21:37

Gábor Bakos

8,982
52
35
52

Sam · Accepted Answer · 2014-06-03T21:49:03.737

2

Update:

I like this one even better. We use a negative lookahead to make sure there isn't ^\s (whitespace at the beginning of the string) or \s$ whitespace at the end of the string. And then match 1 alphanumeric character. We repeat this 1-20 times.

/^(?:(?!^\s|\s$)[a-z0-9\s]){1,20}$/i

Demo

^            (?# beginning of string)
(?:          (?# non-capture group for repetition)
  (?!        (?# begin negative lookahead)
    ^\s      (?# whitespace at beginning of string)
   |         (?# OR)
    \s$      (?# whitespace at end of string)
  )          (?# end negative lookahead)
  [a-z0-9\s] (?# match one alphanumeric/whitespace character)
){1,20}      (?# repeat this process 1-20 times)
$            (?# end of string)

Initial:

I use a negative lookahead at the beginning of the string ((?!...)) to make sure that we don't start off with whitespace. Then we check for 0-19 alphanumeric (case-insensitive thanks to i modifier) or whitespace characters. Finally, we make sure we end with a pure alphanumeric character (no whitespace) since we can't use lookbehinds in Javascript.

/^(?!\s)[a-z0-9\s]{0,19}[a-z0-9]$/i

edited Jun 03 '14 at 21:49

answered Jun 03 '14 at 21:43

Sam

20,096
2
45
71

I am not too familiar with ?:, so could you give me a short explanation on that? Does the (?!^\s|\s$) actually work that way? If so, I never knew that. – dalawh Jun 04 '14 at 01:31
@dalawh `(?:...)` is a non-capturing group..we need a group to perform the repetition, and the `?:` just prevents an unnecessary capture group. The negative lookahead does a zero-length assertion (in other words, it just checks to make sure the next character(s) don't match)..since this lookahead is within the repeated non-capturing group, we check every single time to make sure we're not looking at any leading/trailing spaces. If we were in PCRE, we could use a negative lookahead and a negative lookbehind..rather than a repeated negative lookahead. – Sam Jun 04 '14 at 01:36
Side note: the repeated negative lookahead will be less efficient than the original solution. That is because it asserts the lookahead every single character, instead of just at the beginning/end of the string. However, with the assertion being 1 character and the repetition limited to 20 times..it shouldn't be noticeable. I prefer this solution since it uses on character class, and one obvious repetition range `{1,20}`. – Sam Jun 04 '14 at 01:44
I guess my next question is what a capture is used for? I get the concept of a group. The best example of how it works that I could find was http://stackoverflow.com/questions/3512471/non-capturing-group. It seems like a capture is used for the purpose of further work done on the group captured, so if you don't want to do any additional work, you use the non-capture group? That is where I am mostly lost. – dalawh Jun 04 '14 at 14:57
@dalawh you usually group things for alternation (`|`) or to capture/reference (all groups capture, unless they are non-capturing `(?:...)`). An example of referencing is `/([a-z])\1+/`..this will match `aaa` (since `a` is captured, so `\1` `=== `a`), but not `abc`. You can also use this in substitutions, so matching `(foo).*` and replacing it with `\1bar`..will change `foofail` to `foobar`. Since we do not need to reference anything in my example, it isn't necessary to capture it (so I used to the non-capturing)..it isn't the biggest deal though, since we aren't referencing anything else. – Sam Jun 04 '14 at 15:24
To explain what I mean by "isn't the biggest deal", take this example. If we match `(abc|xyz)(foo|bar)` and want to replace it with `foo` or `bar`, we would need to use the substitution `\2` (which would change `abcfoo` to `foo`). It feels weird referencing a "second" capture group when the "first" one is never referenced...so I would instead match `(?:abc|xyz)(foo|bar)` and substitute `\1`. – Sam Jun 04 '14 at 15:26
Thanks for the explanation. By any chance, do you know if these type of regex work with C#. I am not too sure how different they are between the two. – dalawh Jun 05 '14 at 15:00

score 0 · Answer 3 · answered Jun 03 '14 at 22:00

0

I think your regex limits the input to 22 characters, not 20.
Are you aware that character range [A-z] includes characters [\]^_`?

I think I'd do something like this:

input = input.trim().replace(/\s+/, ' ');
if (input.length > MAX_INPUT_LENGTH ||
    ! /^[a-z ]+$/i.match(input) ) {
  # raise exception?
}

answered Jun 03 '14 at 22:00

djinnit

1

I specifically need regex. – dalawh Jun 04 '14 at 01:03

score 0 · Answer 4 · answered Jun 03 '14 at 22:48

0

\S matches a non-whitespace character. Therefore this should match what you're looking for:

^\S.{0,18}\S$

That is, a non-space character \S, followed by up to 18 of any type of character . (space or not), and finally a non-space character.

The only limitation of the above regex is that the value must be at least 2 characters. If you need to allow 1 character, you can use:

^\S(.{0,18}\S)?$

If you're looking to validate a user name (as you implied but didn't explicitly state) you're probably looking to allow only numbers, letters, and underscores. In that case, ^\w{1,20}$ will suffice.

answered Jun 03 '14 at 22:48

craigpatik

941
13
25

The minimum char is 0, so it is supposed to pass if it is empty. – dalawh Jun 04 '14 at 01:12
`^\(S(.{0,18}\S)?)?$` will allow it to be empty – craigpatik Jun 04 '14 at 11:45

alpha bravo · Answer 5 · 2014-06-04T02:08:15.040

0

use this pattern ^(?!\s).{0,20}(?<!\s)$

^(?!\s) start of line does not see a space
.{0,20} followed by 0 to 20 characters
(?<!\s)$ ends with a character that is not a space

Demo

or this pattern ^(\S.{0,18}\S)?$
Demo

edited Jun 04 '14 at 02:08

answered Jun 04 '14 at 01:48

alpha bravo

7,838
1
19
23

1

Be careful, no look behind in javascript and your second pattern doesn't accept one character strings. – Robin Jun 04 '14 at 06:12

Is there a better way to write a regex that does not match on leading and trailing spaces along with a character limit?

5 Answers5