21

I recently needed to create a regular expression to check input in JavaScript. The input could be 5 or 6 characters long and had to contain exactly 5 numbers and one optional space, which could be anywhere in the string. I am not regex-savvy at all and even though I tried looking for a better way, I ended up with this:

(^\d{5}$)|(^ \d{5}$)|(^\d{5} $)|(^\d{1} \d{4}$)|(^\d{2} \d{3}$)|(^\d{3} \d{2}$)|(^\d{4} \d{1}$)  

This does what I need, so the allowed inputs are (if 0 is any number)

'00000'  
' 00000'  
'0 0000'  
'00 000'  
'000 00'  
'0000 0'  
'00000 '

I doubt that this is the only way to achieve such matching with regex, but I haven't found a way to do it in a cleaner way. So my question is, how can this be written better?

Thank you.

Edit:
So, it is possible! Tom Lord's answer does what I needed with regular expressions, so I marked it as a correct answer to my question.

However, soon after I posted this question, I realized that I wasn't thinking right, since every other input in the project was easily 'validatable' with regex, I was immediately assuming I could validate this one with it as well.

Turns out I could just do this:

const validate = function(value) {
    const v = value.replace(/\s/g, '')
    const regex = new RegExp('^\\d{5}$');
    return regex.test(v);
}  

Thank you all for the cool answers and ideas! :)

Edit2: I forgot to mention a possibly quite important detail, which is that the input is limited, so the user can only enter up to 6 characters. My apologies.

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
EyfI
  • 975
  • 2
  • 17
  • 24
  • 1
    Not sure you can capture this in a regex without obscene contortions, but should be easy to solve imperatively by simply looping through the input characters. – Jared Smith Jun 09 '17 at 16:14
  • Alternatively do it in 2 parts: count the occurrences of a space in the string, if that's 0 or 1, then go ahead and match against `[\d ]` (or you may have to validate the string length as well) –  Jun 09 '17 at 16:15
  • Regular expressions: Now you have two problems :) – Tom Lord Jun 09 '17 at 16:50
  • "Regular expressions: Now you have two problems :)" I'm not sure if I follow :P – EyfI Jun 09 '17 at 16:53
  • @EyfI https://xkcd.com/1171/ – E.D. Jun 09 '17 at 20:35
  • I had a feeling it will be something along those lines... Thanks :D – EyfI Jun 09 '17 at 20:36
  • 1
    @Eyfl A more thorough analysis of the "now you have two problems": https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ – Kevin Fee Jun 09 '17 at 22:56
  • Your alternate answer also matches '0 00 0 0' (for instance). – abligh Jun 10 '17 at 06:56
  • It would if you could enter eight characters, I forgot to mention an important detail -> edit2 of my question. – EyfI Jun 10 '17 at 06:58
  • [Here's another way](https://i.stack.imgur.com/tqWSu.png) to write it more elegantly. – user541686 Jun 10 '17 at 08:27
  • @EyfI Regarding your "Edit2 ... input is limited" -- this should go without saying, but make sure you're not *relying* on front-end validations! These are very easy to bypass. – Tom Lord Jun 10 '17 at 10:37
  • That's a good reminder. I'll add a check at the beginning of the function to control whether the input is 5 or 6 characters long. – EyfI Jun 10 '17 at 11:14

7 Answers7

23

Note: Using a regular expression to solve this problem might not be the best answer. As answered below, it may be easier to just count the digits and spaces with a simple function!

However, since the question was asking for a regex answer, and in some scenarios you may be forced to solve this with a regex (e.g. if you're tied down to a certain library's implementation), the following answer may be helpful:

This regex matches lines containing exactly 5 digits:

^(?=(\D*\d){5}\D*$)

This regex matches lines containing one optional space:

^(?=[^ ]* ?[^ ]*$)

If we put them together, and also ensure that the string contains only digits and spaces ([\d ]*$), we get:

^(?=(\D*\d){5}\D*$)(?=[^ ]* ?[^ ]*$)[\d ]*$

You could also use [\d ]{5,6} instead of [\d ]* on the end of that pattern, to the same effect.

Demo

Explanation:

This regular expression is using lookaheads. These are zero-width pattern matchers, which means both parts of the pattern are "anchored" to the start of the string.

  • \d means "any digit", and \D means "any non-digit".

  • means "space", and [^ ] means "any non-space".

  • The \D*\d is being repeated 5 times, to ensure exactly 5 digits are in the string.

Here is a visualisation of the regex in action:

regex visualisation

Note that if you actually wanted the "optional space" to include things like tabs, then you could instead use \s and \S.


Update: Since this question appears to have gotten quite a bit of traction, I wanted to clarify something about this answer.

There are several "simpler" variant solutions to my answer above, such as:

// Only look for digits and spaces, not "non-digits" and "non-spaces":
^(?=( ?\d){5} *$)(?=\d* ?\d*$)

// Like above, but also simplifying the second lookahead:
^(?=( ?\d){5} *$)\d* ?\d*

// Or even splitting it into two, simpler, problems with an "or" operator: 
^(?:\d{5}|(?=\d* \d*$).{6})$

Demos of each line above: 1 2 3

Or even, if we can assume that the string is no more than 6 characters then even just this is sufficient:

^(?:\d{5}|\d* \d*)$

So with that in mind, why might you want to use the original solution, for similar problems? Because it's generic. Look again at my original answer, re-written with free-spacing:

^
(?=(\D*\d){5}\D*$) # Must contain exactly 5 digits
(?=[^ ]* ?[^ ]*$)  # Must contain 0 or 1 spaces
[\d ]*$            # Must contain ONLY digits and spaces

This pattern of using successive look-aheads can be used in various scenarios, to write patterns that are highly structured and (perhaps surprisingly) easy to extend.

For example, suppose the rules changed and you now wanted to match 2-3 spaces, 1 . and any number of hyphens. It's actually very easy to update the regex:

^
(?=(\D*\d){5}\D*$)       # Must contain exactly 5 digits
(?=([^ ]* ){2,3}[^ ]*$)  # Must contain 2 or 3 spaces
(?=[^.]*\.[^.]*$)        # Must contain 1 period
[\d .-]*$   # Must contain ONLY digits, spaces, periods and hyphens

...So in summary, there are "simpler" regex solutions, and quite possibly a better non-regex solution to OP's specific problem. But what I have provided is a generic, extensible design pattern for matching patterns of this nature.

Tom Lord
  • 27,404
  • 4
  • 50
  • 77
  • 1
    Wont this pass `"f7364ffff8f "` through? "The input could be 5 or 6 characters long and had to contain exactly 5 numbers and one optional space" – Yury Tarabanko Jun 09 '17 at 16:32
  • Thank you for the correction @YuryTarabanko, I have resolved the issue. – Tom Lord Jun 09 '17 at 16:46
  • 1
    A simpler solution is `^(?=(\D*\d){5}\D*$)\d* ?\d*$`, which reads: "exactly 5 digits" and "nothing but digits with an optional space". Using the added condition that there are no more than 6 chars, it can get even simpler: `^( ?\d){5} ?`. The added condition can be stated as `(?=.{,6}$)`. – maaartinus Jun 09 '17 at 20:37
  • 4
    Way too complicated. Too easy to misread or be confused by in 6 months. Too hard to change if requirements change. – jpmc26 Jun 09 '17 at 23:38
  • @jpmc26 Too complicated and confusing? Yes, it's far from a simple answer! If there's a good viable non-regex answer for whatever the specific use case is, then it's definitely worth considering. However, it's actually *not* super hard to change, precisely because of how I've written it. See the edit above... – Tom Lord Jun 10 '17 at 09:48
  • Note that `\d` could mean more than just the digits `0-9`: https://stackoverflow.com/a/6479605/3878168. If you only want the digits `0-9` you should change `\d` to `[0-9]` and `\D` to `[^0-9]`. – Yay295 Jun 10 '17 at 13:46
  • 1
    @yay295 Let's not muddy the waters here; it's fine to use `\d`. That post you've linked is talking about python3, not javascript. And issues would only perhaps arise if you enable the utf8 regex modifier (`/u`) - and even then, I unsure how javascript behaves. – Tom Lord Jun 10 '17 at 14:01
  • My simpler solution generalizes when you say "space, period or hyphen" instead of "non-digit". This way you can always save the "Must contain ONLY ..." part (unless you need to constrain the total length). Anyway, in the general case, I'd prefer your systematic approach. It's neither complicated nor confusing, once we understand that `(?=...$)` is just a poor-man's intersection. +1 – maaartinus Jun 10 '17 at 17:48
  • The `^(?:\d{5}|\d* \d*)$` pattern for "assume that the string is no more than 6 characters" validates e.g. `11 45` and thus does not work. – Mikal Madsen Nov 27 '17 at 16:17
  • @MikalMadsen Yeah, sorry - I didn't think that bit through fully. The logic should be "check that the string is 6 chars and only *then* use the `\d* \d*` regex". – Tom Lord Nov 27 '17 at 17:28
8

I suggest to first check for exactly five numbers ^\d{5}$ OR look ahead for a single space between numbers ^(?=\d* \d*$) among six characters .{6}$.

Combining those partial expressions yields ^\d{5}$|^(?=\d* \d*$).{6}$:

let regex = /^\d{5}$|^(?=\d* \d*$).{6}$/;

console.log(regex.test('00000'));   // true
console.log(regex.test(' 00000'));  // true
console.log(regex.test('00000 '));  // true
console.log(regex.test('00 000'));  // true
console.log(regex.test('  00000')); // false
console.log(regex.test('00000  ')); // false
console.log(regex.test('00  000')); // false
console.log(regex.test('00 0 00')); // false
console.log(regex.test('000 000')); // false
console.log(regex.test('0000'));    // false
console.log(regex.test('000000'));  // false
console.log(regex.test('000 0'));   // false
console.log(regex.test('000 0x'));  // false
console.log(regex.test('0000x0'));  // false
console.log(regex.test('x00000'));  // false

Alternatively match the partial expressions separately via e.g.:

/^\d{5}$/.test(input) || input.length == 6 && /^\d* \d*$/.test(input)
le_m
  • 19,302
  • 9
  • 64
  • 74
7

This seems more intuitive to me and is O(n)

function isInputValid(input) {
    const length = input.length;
    if (length != 5 && length != 6) {
        return false;
    }

    let spaceSeen = false;
    let digitsSeen = 0;
    for (let character of input) {
        if (character === ' ') {
            if (spaceSeen) {
                return false;
            }
            spaceSeen = true;
        }
        else if (/^\d$/.test(character)) {
            digitsSeen++;
        }
        else {
            return false;
        }
    }

    return digitsSeen == 5;
}
jpmc26
  • 28,463
  • 14
  • 94
  • 146
  • Like your approach. – Piyush Jun 09 '17 at 16:32
  • Yup, you were right all along. This was never a purely regex problem to begin with. I came up with a somewhat different approach (which you can see edited into my question). I'm not sure if yours is more efficient, but I upvoted your answer, as you quickly figured out what I should've figured out long time ago. Cheers! – EyfI Jun 09 '17 at 17:13
  • Oh, looking into your function a bit, it will probably return true for 6 numbers too, which it shouldn't. – EyfI Jun 09 '17 at 17:27
  • 1
    @EyfI Fixed it. Hope you don't mind, Jonathan. +1 for a non-regex answer. Regex is not the way to do all string processing. – jpmc26 Jun 09 '17 at 23:37
1

You can split it in half:

var input = '0000 ';

if(/^[^ ]* [^ ]*$/.test(input) && /^\d{5,6}$/.test(input.replace(/ /, '')))
  console.log('Match');
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
1

Here's a simple regex to do the job:

^(?=[\d ]{5,6}$)\d*\s?\d*$

Explanation:

^ asserts position at start of the string

Positive Lookahead (?=[\d ]{5,6}$)

Assert that the Regex below matches

Match a single character present in the list below [\d ]{5,6}

{5,6} Quantifier — Matches between 5 and 6 times, as many times as possible, giving back as needed (greedy)

\d matches a digit (equal to [0-9]) matches the character literally (case sensitive)

$ asserts position at the end of the string \d* matches a digit (equal to [0-9])

  • Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

\s matches any whitespace character (equal to [\r\n\t\f\v ])

\d* matches a digit (equal to [0-9])

  • Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of the string

John Smith
  • 1,559
  • 1
  • 12
  • 18
  • 1
    Now it is a bit too relaxed regarding the number of digits, it matches '000000' and '000 0' – le_m Jun 09 '17 at 17:22
0
string="12345 ";
if(string.length<=6 && string.replace(/\s/g, '').length<=5 && parseInt(string,10)){
  alert("valid");
}

You could simply check the length and if its a valid number...

Jonas Wilms
  • 132,000
  • 20
  • 149
  • 151
  • 1
    this check fails unless the space is at the front or end of `string` since that's the only place that trim() will remove the space and make string.length <= 5 – Mike Corcoran Jun 09 '17 at 16:24
0

This is how I would do it without regex:

string => [...string].reduce(
    ([spaces,digits], char) =>
        [spaces += char == ' ', digits += /\d/.test(char)],
    [0,0]
).join(",") == "1,5";
corvus_192
  • 362
  • 4
  • 15