0

I want to build a Regex that does the following:

  1. Limit inputs to 0-9, A-z (upper and lower), and \-|/
  2. Make sure the first character is not 0.

So these should get through:

  • ASDFGHJKLQWERTYUIOPZXCVBNM
  • SUBSCRIBETOME
  • AHH-HHH01/\-65AHH
  • 1234567890
  • 1
  • ||||||||||

But these shouldn't.

  • 0
  • !
  • 0123456789
  • !@#$%^&*()
  • {}:"

So far, I have this:

^[^0\S][\\\-/]*\w*\S*$

Per my understanding of Regex, this is what happens: The first [^] set blacklists empty spaces and 0 from being in the first character. The latter parts (so [\-/]* and \w* and \S*) sets matches for 0-9, A-z (upper and lower), and \ - | /.

My issue is that right now, the blacklist is taking up the first character, and there are other characters I would like to blacklist. Most special characters would get through too if it's in the first character. This will get through:

  • !
  • !AHHHHHHHH

As a result, I am looking to expand the blacklist, like so:

^[^0\S@#$%^&*()][\\\-/]*\w*\S*$

But doing it like this would require me to put in a lot of special characters, which I am trying to avoid.

Doing this, as suggested here: Regular expression for excluding special characters

^[\\\-/]*\w*\S*$

would change it from a blacklist to a whitelist system, but would get rid of my 2nd requirement, which is that "0" cannot be a starting character. This is the main factor which makes it hard for me to follow other answers on stackoverflow, as the first character has a slightly different limitation from the rest of the characters.

I am wondering whether there is an easier way of indicating the following logic:

"Regex needs to blacklist '0' and whitespaces from the 1st character, but also anything not part of the following whitelist"

Please let me know if more information is required.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
HFOrangefish
  • 267
  • 1
  • 10

3 Answers3

2

The simpler the better:

/^[1-9A-Za-z\\|\/-][0-9A-Za-z\\|\/-]*$/

See https://regex101.com/

^ asserts position at start of a line

Match a single character present in the list below [1-9A-Za-z\\|\/-]
1-9 matches a single character in the range between 1 (index 49) and 9 (index 57) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
\\ matches the character \ with index 9210 (5C16 or 1348) literally (case sensitive)
| matches the character | with index 12410 (7C16 or 1748) literally (case sensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case sensitive)
- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)

Match a single character present in the list below [0-9A-Za-z\|/-]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
\\ matches the character \ with index 9210 (5C16 or 1348) literally (case sensitive)
| matches the character | with index 12410 (7C16 or 1748) literally (case sensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case sensitive)
- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)

$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

YSC
  • 38,212
  • 9
  • 96
  • 149
2

Personally I wouldn't use a regular expression for this. Plain std::isalnum would take care of the most, then some explicit checks for the remaining few characters and conditions:

bool check_valid_input(std::string const& input)
{
    // Don't allow empty inputs, or inputs that start with the digit 0
    if (input.length() == 0 || input[0] == '0')
    {
        return false;
    }

    // Allow all alpha-numeric characters, plus a few others
    return std::all_of(begin(input), end(input), [](char ch)
    {
        return std::isalnum(ch) || ch == '\\' || ch == '/' || ch == '-' || ch == '|';
    });
}

Regular expressions are almost always overkill, and tend to become too complex and hard to maintain.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
0

To satisfy your conditions:

  • Limit inputs to 0-9, A-z (upper and lower), and \ - | /
  • Make sure the first character is not 0.

You can write your regex as ^[1-9A-Za-z\\\-/][0-9A-Za-z\\\-/]*$

  • ^[1-9A-Za-z\\\-/] ensures the first character is a digit from 1-9, any uppercase or lowercase letter, or one of the symbols , -, |, /.
  • [0-9A-Za-z\\\-/]*$ asserts the remaining characters to be either a digit from 0-9, an uppercase or lowercase letter, or one of the symbols , -, |, /.

Regex101 is a good online tool for testing and debugging regex.

Akhilesh Pandey
  • 855
  • 5
  • 10
  • Too late :p As a trick you could find useful: you can avoid escaping `-` by making it the last item in a `[class]` – YSC Jul 28 '23 at 09:07
  • Yes, you're correct. When `-` is the last character in a character class, it does not need to be escaped, because it cannot form a range with a non-existing character after it. Thank you for the tip :) – Akhilesh Pandey Jul 28 '23 at 09:17