1

I would like to validate a user's input and limit the input to alphanumeric characters only (underscores may be allowed as well), but i'm not sure which method is best for this.

I've seen various examples on SA and the first one that raises some questions for me is the following one:

:input
set "in="
set /p "in=Please enter your username: "

ECHO(%in%|FINDSTR /ri "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" >nul || (

    goto input

)

I see a second case that's identical to the first one (with as expection, the leading ^ and ending *$).

Why is the extra case and ^ *$ needed when the following also works?:

:input
set "in="
set /p "in=Please enter your username: "

ECHO(%in%|FINDSTR /ri "[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >nul || (

    goto input

)

Finally, The FOR /F loop method i've noticed on here as well:

for /f "delims=1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%a in ("%in%") do goto :input

Is there any (dis)advantage in using this over the beforementioned FINDSTR regex one?

aschipfl
  • 33,626
  • 12
  • 54
  • 99
script'n'code
  • 345
  • 4
  • 17
  • 1
    `^` and `$` are anchors that match the beginning and end of the line, respectively. If the pattern doesn't match immediately after the beginning of the line and stop matching up until the end, then the first regex will fail, whereas the second pattern will match any suitable string within the input. Also, you might want to consider using character sets such as `[A-Z]` and `\d` instead of listing out the entire alphabet and all 10 digits. The entire regex pattern you posted is equivalent to `[A-Z\d]`. – CAustin Jun 02 '17 at 00:55
  • 1
    @CAustin `findstr` regex syntax doesn't support `\d`. It'd have to be `^[a-z0-9_]*$` with the `/i` switch, or `^[A-Za-z0-9_]*$` without. – rojo Jun 02 '17 at 04:03
  • CAustin: Got it! thanks :-) rojo: Thank you for mentioning it! – script'n'code Jun 02 '17 at 13:52

2 Answers2

2

First, you have to reference environment variable in with using delayed expansion to avoid an exit of batch file execution because of a syntax error when the user enters a string with critical characters like ><|&". Always take into account that a variable specified with %variable% is expanded before execution of the command line which can easily break batch execution on user input variable strings.

Second, it is strongly recommended to immediately verify if the user has input anything at all after the prompt, i.e. use if not defined in goto input after the prompt command line.

Third, I think the FOR method is better because of being faster.

FINDSTR is not an internal command of cmd.exe like FOR. So when specifying FINDSTR in batch file without path and without file extension Windows command interpreter must first search for this executable and hopefully really finds %SystemRoot%\System32\findstr.exe via PATHEXT and PATH.

Next with an anti-virus process running in background the execution of findstr.exe triggers the scanning process of anti-virus process which results in a delay of execution.

The execution of an application like FINDSTR by Windows command interpreter takes always a bit longer as the execution of an internal command of cmd.exe even with no anti-virus scan process running. So the FOR loop approach is most likely (not verified by me) faster than the FINDSTR approach.

On using FINDSTR the regular expression characters ^ and *$ are needed because the regular expression search string [0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ] results in a positive match if the processed line contains anywhere at least 1 digit or letter. So it is not checked if the line (= string of variable) consists of only digits and letters. The shorter character class definitions [0-9A-Z] with depending on option /I or [0-9A-Za-z] can't be used in this case as explained by aschipfl in his comment below.

With ^ is specified that the searched string must be found at beginning of a line, with * that 0 or more digits or letters must be found, and with $ that the searched string must be found at end of line. Or in other words the entire line (user input) not being completely empty as checked before must completely consist of only digits and letters for a positive match.

For every internal or external command help on command can be get by running the command from within a command prompt window with /? as parameter. Try it out with opening a command prompt window and run findstr /? and for /? and set /?.

Mofi
  • 46,139
  • 17
  • 80
  • 143
  • 1
    Note that character ranges may return false positives: `[0-9]` may also match `²` and `³`, depending on code page; `[A-Z]` also matches lower-case letters (and vice versa), and it may also match `Á` or `á`, for example, also depending on code page; see this for details: [What are the undocumented features and limitations of the Windows FINDSTR command?](https://stackoverflow.com/q/8844868) – aschipfl Jun 02 '17 at 08:59
  • May i say "oh my god"? I'm amazed by the quality of your (and aschipfl's) answer. It has everything i wanted/needed to know (and more). The only dilemma i'm facing now is which answer to mark as solution (can there be two? ^_^ ). Nevertheless, i'm greatly impressed and appreciate your %input% (pun intended), Mofi. :-) – script'n'code Jun 02 '17 at 14:05
  • 1
    I learned also a lot on reading excellent answer written by [aschipfl](https://stackoverflow.com/users/5047996/aschipfl) and upvoted his answer, too. It was definitely a good decision to favor his answer for accepting over mine. – Mofi Jun 02 '17 at 14:40
2

For safely validating user input, both methods are reliable, but you must improve them:


findstr method

At first, let us focus on the search string like ^[...][...]*$ (where ... stands for a character class, meaning a set of characters): A character class [...] matches any one character from set ...; * means repetition, so matching zero or more occurrences, hence [...]* matches zero or more occurrences of characters from set ...; therefore, [...][...]* matches one or more occurrences of characters from set .... The leading ^ anchors the match to the beginning of the line, the trailing $ anchors it to the end; therefore, when both anchors are specified, the entire line must match the search string.

Concerning character classes [...]: According to the thread What are the undocumented features and limitations of the Windows FINDSTR command?, classes are buggy; for instance, the class [A-Z] matches small letters b to z, and [a-z] matches capital letters A to Y (this does of course not matter in case a case-insensitive search is done, so when /I is given); the class [0-9] may match ² or ³, depending on the current code page; [A-Z] and [a-z] may match special letters like Á or á, for example, also depending on current code page. Hence to safely match certain characters only, do not use ranges, but specify each character individually, like [0123456789], [ABCDEFGHIJKLMNOPQRSTUVWXYZ] or [abcdefghijklmnopqrstuvwxyz].

All this leads us to the following findstr command line:

findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"

Nevertheless, the whole approach with the piped echo might still fail, because special characters like ", &, ^, %, !, (, ), <, >, | could lead to syntax errors or other unintended behaviour. To avoid that, we need to establish delayed expansion, so the special characters become hidden from the command parser. However, since pipes (|) initialise new cmd instances for either side (which inherit the current environment), we need to ensure to do the actual variable expansion in the left child cmd instance rather than in the parent one, like this:

:INPUT
set "IN="
set /P IN="Please enter your username: "

cmd /V /C echo(^^!IN^^!| findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" > nul || goto :INPUT

The extra explicit cmd instance is needed to enable delayed expansion (/V), because the instances initiated by the pipe have delayed expansion disabled.

The doubled escaping of the exclamation marks ^^! is only needed in case delayed expansion is also enabled in the parent cmd instance; if not, single escaping ^! was sufficient, but doubled escaping does not harm.


for /F method

This approach makes life easier, because there is no pipe involved and so, you do not have to deal with multiple cmd instances, but there is still room for improvement. Again, special characters may cause trouble, so delayed expansion needs to be enabled.

The for /F loop ignores empty lines and such beginning with the default eol character, the semicolon ;. To disable the eol option, simply define one of the delimiter characters, so eol becomes hidden behind delims. Empty lines are not iterated, so the goto command in your approach would never execute in case of empty user input. Therefore, we must capture empty user input explicitly, using an if statement. Now all this leads to the following code:

setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "

if not defined IN goto :INPUT
for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do goto :INPUT

endlocal

This approach detects capital letters only; to include small letters as well, you have to add them to the delims option: delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

Note that variable IN is no longer available beyond endlocal, but this should be the very last comand of your script anyway.

To detect whether or not a for /F loop iterated or not, there is an undocumented feature, which we can make use of: for /F returns a non-zero exit code if it does not iterate, hence conditional execution operators && or || can be used; so, when the user input is empty, the loop does not iterate, then ||; for this to work, the for /F loop must be enclosed within parentheses:

setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "

if not defined IN goto :INPUT
(for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do rem/) && goto :INPUT

endlocal
aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • Are you a book writer? If not, consider becoming one! I enjoyed reading every single word and you also taught me things i didn't know of being possible in the first place :-) I'm a guy that RTFM's a lot, but sometimes a different approach is needed and Stack Overflow offers me the best solution(s). - I think i'll have to mark your answer as solution. It somehow tickles my brain by an edge. Your total reputation is lower than Mofi's so i guess my vote is well earned/deserved! ;-) – script'n'code Jun 02 '17 at 14:25
  • 1
    Thank you for the flowers! ;-) No, I am not a book writer, but I stumbled over the same issues several times in the past and tried a lot to do fool-proof user input validation, also relying on the information from the [post I have linked](https://stackoverflow.com/q/8844868)... – aschipfl Jun 02 '17 at 15:02
  • Don't forget to give them enough water on a daily base ;-) ps: To allow lower case alphabet i suppose i have to add these to the delims as well? It doesn't look sweet but i wouldn't want the code to fail using FINDSTR regex if it encounters special/poison characters. – script'n'code Jun 02 '17 at 16:26
  • 1
    I will... ;-) Yes, you need to list them all like `for /F "delims=ABC...Zabc...z"`... – aschipfl Jun 02 '17 at 16:52
  • **One more thing:** In the FOR /F loop, what's the '/' needed for in 'rem/'? – script'n'code Jun 02 '17 at 18:00
  • `rem` is there no not have an empty loop body as this is forbidden; `rem` however treats every following text as a remark, while `rem/` lets `) &&` to be recognised and executed... – aschipfl Jun 02 '17 at 18:08
  • status confirmed: Knowledge expanded ;-) – script'n'code Jun 02 '17 at 19:53