5

So - here's the code

@echo off
setlocal

for %%a in (a A j J z Z) do for %%c in (d D) do if "%%c" geq "%%a" (echo "%%c" geq "%%a") else (echo "%%c" lss "%%a")

for %%a in (Blue blue BLUE Red red RED) do for %%c in (Pink pink PINK) do if "%%c" geq "%%a" (echo "%%c" geq "%%a") else (echo "%%c" lss "%%a")

Here's the result:

Microsoft Windows [Version 10.0.19042.804]
(c) 2020 Microsoft Corporation. All rights reserved.

"d" geq "a"
"D" geq "a"
"d" geq "A"
"D" geq "A"
"d" lss "j"
"D" lss "j"
"d" lss "J"
"D" lss "J"
"d" lss "z"
"D" lss "z"
"d" lss "Z"
"D" lss "Z"
"Pink" geq "Blue"
"pink" geq "Blue"
"PINK" geq "Blue"
"Pink" geq "blue"
"pink" geq "blue"
"PINK" geq "blue"
"Pink" geq "BLUE"
"pink" geq "BLUE"
"PINK" geq "BLUE"
"Pink" lss "Red"
"pink" lss "Red"
"PINK" lss "Red"
"Pink" lss "red"
"pink" lss "red"
"PINK" lss "red"
"Pink" lss "RED"
"pink" lss "RED"
"PINK" lss "RED"

(I just cut/paste/censored the screen to show the Windows version)

So - I'm about to go to do my grocery shopping (since its ~midnight) which will allow me to clear my head.

Am I seeing things? Doesn't batch do an ASCII comparison any more? I recall it used to.

AFAICS, if has suddenly decided to automatically do a case-insensitive comparison. That'll break many a SO solution.


So - bulk responses. I'm still trying to process it all.

The base problem:

I have a label printer which has 5 inbuilt dot-matrix fonts and allows a multiplier to be assigned to both the X and Y dimension. Obviously, applying 1 for the Xmultiplier and 6 for Y produces ugly tall-and-thin characters. Reversing these produces ugly squat characters. Also, not all multipliers (1-9) are available in each direction.

I therefore have a table of "acceptable" X-Y multiplier-pairs which do not produce obviously-distorted characters. Since I'm dealing with monospaced fonts, each font & multiplier-pair yields a coverage per character, which I want to maximise. Applying each possible combination to resolve the maximum coverage is a simple process, allowing the selection of the font and multiplier for each element of the label.

The fly in the ointment is that one font is upper-case only, so I wanted to apply an exclusion for that font if the element contains a lower-case letter.

The code I used for that exclusion was to apply an if test to each character %%c

 if "%%c" geq "a" if "%%c" leq "z" set "islower=Y"

BUT this doesn't work as advertised. It will set islower regardless of case. So the caps-only font was always excluded where the text contained alphas and therefore I never observed its being used. Oops.

Hence this question.

I've been doing many experiments as well as catching up on beauty-sleep (which I need) for the past few days, frankly dreading the volume of responses and comments.

Conclusions:

/i is redacted from the if if it's the first token.
not is redacted from the if if it's then the first token.

  • hence beware if string1==string2 if string1 is resolved to not

We're then left with string1 operator string2 and a complex relationship between the operator used and the precise format of the strings.

== is the simplest. There is no requirement for == to be preceded or succeeded by separators. The strings are compared alphabetically, hence you need the /i switch to perform a case-insensitive comparison.

equ and its family is where things get more complicated. Each must be preceded and succeeded by separators. The characteristics of the comparison made depends on the structure of the operands.

  • the operands may be strings, "quoted strings", pure-decimal (digits 0-9) or pure-octal (leading 0, digits 0-7)
  • In the case of pure-numeric arguments, the arguments are converted to binary, and the results are compared.
    • Hence IF 066 equ 54 evaluates as TRUE because 066-octal equals 54 decimal. == predictably evaluates this as FALSE
  • In the case of quoted-decimal-strings IF "102" gtr "94" evaluates as FALSE because 1 is not greater than 9
  • With the gtr geq lss leq operators on strings, operation becomes truly bizarre. CAT gtr cat (quoted or no) evaluates as TRUE, as do dog gtr cat and "dog" gtr "cat", regardless of case of either operand.
    • Even more outlandish as everyone knows that cat is greater than dog.

Sorry - It's really all too complicated for me. I'm off for a rest.

Unfortunately, this problem really doesn't seem to fit SO's Q&A format.

Magoo
  • 77,302
  • 8
  • 62
  • 84
  • I get the same results in a Windows XP virtual machine. I think it's always been case-insensitive, which is strange considering that the `/I` flag is explicitly listed in the compare-op section of the help as something you can use "to do case insensitive string compares." – SomethingDark Feb 22 '21 at 16:35
  • Change your %%a loop set to the following. `Blue blue BLUE Red red RED PinK pInk piNk PinK`. The output is case sensitve as expected when dealing with strings of equal length. The above example is expected behavioour when dealing with different string lengths using the omparitors `EQU` / `GTR` / `LSS` etc – T3RR0R Feb 22 '21 at 17:07
  • @T3RR0R - if that was the case, then the script would output `Pink lss blue` because both are four characters and `P` has a lower ASCII value than `b`. – SomethingDark Feb 22 '21 at 17:11
  • These are the same length and D>j on the ASCII table: `"D" lss "j"` – jwdonahue Feb 22 '21 at 17:16
  • I would add that based on the length, `Pink > Red`. – jwdonahue Feb 22 '21 at 17:19
  • Except the comparison doesnt acount for length. Example: `If p GTR abcdef Echo true` – T3RR0R Feb 22 '21 at 17:23
  • See [this Answer](https://stackoverflow.com/a/49601468/12343998) to a duplicate Question for details - The short of it is that comparisons using GEQ LSS etc are not ASCII based – T3RR0R Feb 22 '21 at 17:24
  • Well you said it does: "he above example is expected behavioour when dealing with different string lengths..." – jwdonahue Feb 22 '21 at 17:24
  • No, I said, maybe not clearly enough, the OP's post returning incorrect assesments when dealing with different string lengths is expected. Only when dealing with Equal string length will the assesment be correct. – T3RR0R Feb 22 '21 at 17:27
  • Please review [the deatiled explanation](https://stackoverflow.com/a/47386323/12343998) that's linked to in the already mentioned duplicate. It is expected that `D` should return as LSS than `j`. Strings are converted to integers when using the aforementioned operators, and the result returned is what can be expected from the utility that performs the conversion and the values it uses. – T3RR0R Feb 22 '21 at 17:42
  • @T3RR0R, no I withdrew that statement. I am also reassessing my earlier comments. Checking something... – jwdonahue Feb 22 '21 at 17:59
  • That still doesn't explain `Pink geq blue` though – SomethingDark Feb 22 '21 at 18:01
  • Need I repeat, the assesment is not based on ASCII values. – T3RR0R Feb 22 '21 at 18:03
  • 1
    @T3RR0R you are actually claiming it does. You claim that cmd.exe converts strings to a number and then compares those numbers. ASCII values are numbers. ASCII for "D" (68) is in fact less than ASCII for "a" (97). The precise internal methods employed by cmd.exe are not relevant. All that matters is, does it yield proper case sensitive sort order, and I think it does. So we probably agree for different reasons. – jwdonahue Feb 22 '21 at 18:13
  • 1
    Mofi's answer clearly says that if neither thing being compared is an integer, then cmd uses `strcmp` to compare the two values. [strcmp](https://www.cplusplus.com/reference/cstring/strcmp/) "performs a binary comparison of the characters" and I don't know how that can mean anything other than a comparison of ASCII values. – SomethingDark Feb 22 '21 at 18:16
  • @T3RR0R, btw, we're not talking about how it converts all numeric strings to numbers here, we're talking about how it compares quoted alphanumeric values, which it does treat as strings of characters. – jwdonahue Feb 22 '21 at 18:16
  • Take the time to read the linked pages and the links within. The comparisons are binary, on a character by character Vs string basis, terminating when the final character of either the first or second string is encountered - whichever happens first. [The documentation found here](http://www.cplusplus.com/reference/cstring/strcmp/) is is for the function that cmd uses to perform the assesment. – T3RR0R Feb 22 '21 at 18:19
  • [See Also](https://en.cppreference.com/w/c/string/byte/strcmp) – T3RR0R Feb 22 '21 at 18:24
  • There's an old programmers adage: "if you think you found a compiler bug, think again". This probably also holds for cmd.exe, if you ignore the handful of well-known quirks that they refuse to fix because we've all come to rely on them. – jwdonahue Feb 22 '21 at 18:32
  • 1
    I've tried the same batch (well, with more test-comparisons) on a Windows Server 2012 machine - and the results were identical. That puts paid to the "changed behaviour" theory. Now I'm worrying about how I failed to notice this over many years. – Magoo Feb 22 '21 at 20:12
  • Just for extra clarification, the output on my only PC, an old IBM 760XL (32MB RAM), running a sluggish Microsoft Windows 2000 Professional Operating System, (`cmd.exe` Version 5.00.2195), is exactly the same as that shown in the question too. – Compo Feb 22 '21 at 21:12
  • Just another interesting instance is with numbers begining with 0 and integer comparison in batch. Some wacky stuff occurs, try doing comparisons with 010, 10, 08, etc – Nico Nekoru Feb 22 '21 at 22:21
  • 1
    @NicoNekoru - Numbers starting with `0` are considered octal, which can throw you off if you aren't expecting it. – SomethingDark Feb 22 '21 at 22:22
  • Oh wow I never knew that, does that mean batch has hex and other radix number support? – Nico Nekoru Feb 22 '21 at 22:24
  • @Magoo I took your script and ran with the `/I` option on the if statements and got exactly the same results. I think no matter the locale, particularly an English locale, there should have been a difference in output. So ya, we've both been missing something for a very long time. – jwdonahue Feb 22 '21 at 22:24
  • @NicoNekoru, I think for any all numeric character string, it always attempts to convert to integer, and I do recall it supporting octal and hex. I don't recall testing what it does when one side of the compare is numeric and the other a string. – jwdonahue Feb 22 '21 at 22:30
  • @Magoo, can you take a look at my annotated list of your results in my non-answer below? If you agree those, can you update your post to include them, so it's more obvious what we're all puzzling over? Or I can do it, I just need more eyeballs on it to make sure I didn't invert my reading of the ASCII table. – jwdonahue Feb 22 '21 at 23:04
  • ... then my Windows Server 2012 machine suffered a hardware failure :( ... – Magoo Feb 28 '21 at 05:45

2 Answers2

1

As already described via comments, Strings are compared on a character by character basis, with the comparison returning it's value after the first non-matching character or the last character in the shortest string is encountered.

The value of each character is converted into a non-locale specific binary value as an unsigned char. From IBM's knowledge Centre:

The relation between the strings is determined by subtracting: string1[i] - string2[i], as i increases from 0 to strlen of the smaller string. The sign of a nonzero return value is determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char)

References:
docs.microsoft
IBM knowledge Centre
C++

A simple set of tests that demonstrates the returns that should be expected for the manner in which strcmp assesses strings:

IF "Pink" equ "pink" (Echo true) else Echo false
false
IF "Pink" LSS "pink" (Echo true) else Echo false
false
IF "Pink" GTR "pink" (Echo true) else Echo false
true
IF "Pink" GTR "pinky" (Echo true) else Echo false
false
IF "Pink" lss "pinky" (Echo true) else Echo false
true
IF "Pink" GTR "Pinky" (Echo true) else Echo false
false
IF "Pink" GTR "pinky" (Echo true) else Echo false
false

The confusion arises due to the assesment disregarding characters beyond the first non matching character:

IF "Pink" lss "aaaaa" (Echo true) else Echo false
false
IF "Pink" GTR "Z" (Echo true) else Echo false
false

false is returned as soon as nonmatching character is encountered, rendering string length irrelevent in the comparison. In the first of the above two examples, false is returned by strcmp as soon as P is evaluated as being GTR than a. The result would be the same if the comparison was: IF "Pink" lss "a" (Echo true) else Echo false

T3RR0R
  • 2,747
  • 3
  • 10
  • 25
  • Internal implementation details are not relevant, nor are they actually documented anywhere for cmd.exe. IBM's docs are definitely not relevant, because Microsoft and IBM went their separate ways nearly 30 years ago. The OP's questions is, does `if` do a case sensitive compare, in the absence of the `/I` switch. They presented evidence that, on it's surface appeared to claim the answer was no. – jwdonahue Feb 22 '21 at 18:58
  • Interesting way to perform what equates to double-negative paraphrasing of the question. Nowhere did the op mention the /I switch, let alone use it in their example. The question asked if cmd.exe was suddenly performing a case insensitive assessment by default, which it is not, nor should it be as supported by the documentation for the internal utility that performs the assesment. More specifically still, the OP asked if the behavior they noticed represented a change in expected behaviour, which it does not. – T3RR0R Feb 22 '21 at 19:13
  • To quote the OP: "AFAICS, if has suddenly decided to automatically do a case-insensitive comparison.". And their script does not use the `/I` switch. I think we're saying the same thing, you and me, I just don't think your answer addresses the OP's issue. You are stating how you think it's supposed to work, how does that relate to the OP's evidence? – jwdonahue Feb 22 '21 at 19:19
  • Ah, there you go! Getting better now. – jwdonahue Feb 22 '21 at 19:19
  • I have included additional example, not GEQ specifically, but also described the point that seems not to be getting accross. The comparison ceases as soon as the first non matching character is encountered, which explains how Pink cannot be GEQ Z. The comparsion terminates immediately after the P. – T3RR0R Feb 22 '21 at 19:38
  • Sure. But what about `"D" geq "a"` ? – jwdonahue Feb 22 '21 at 19:42
  • In what world would you expect D not to be GEQ a, and If you do, why would you expect `strcmp` to return such a result? – T3RR0R Feb 22 '21 at 19:43
  • Look at the OP's output. Look at their script. D is 68 and a is 97. And strcmp is not relevant, this thread isn't tagged `C`. – jwdonahue Feb 22 '21 at 19:54
  • There is nothing erroneous in D or d being GEQ a or A. You seem very stuck on thinking in terms of ASCII values when they are of absolutely no relevence as they are not used in any way to perform the comparison. – T3RR0R Feb 22 '21 at 20:09
  • LOL. We've already established that string comparisons are in fact based on the ASCII values of the characters. What else could possibly be used? In fact, the Win10 cmd.exe uses UTF-8, which exactly overlaps the first 128 ASCII characters. – jwdonahue Feb 22 '21 at 20:14
  • So please review my manual checks of the OP's result, in my recent post. Haven't had my caffeine yet, and working manually with a black on bright white [ASCII table](http://www.asciitable.com/) is error prone. – jwdonahue Feb 22 '21 at 20:25
  • You've established no such thing, as proven by the results returned when using `if` in conjunction with operators other `==`. D NEQ 68 nor does a EQU 97 in the context the IF command executes, otherwise The command would return D LEQ a, which it doesn not. You are operating on false assumption. – T3RR0R Feb 22 '21 at 20:25
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/229056/discussion-between-jwdonahue-and-t3rr0r). – jwdonahue Feb 22 '21 at 20:26
  • Unfortunately, I find you `C++` link to be a little suspect, and it appears to contradict your `strcmp` argument - `For a function that takes into account locale-specific rules, see strcoll.` The variable references change from `str1` to `ptr1` and if you look at the code example, it would appear the claimed output even spelling-corrects from the source code. – Magoo Feb 28 '21 at 08:00
0

I don't expect this to be the OP's answer here, but comments just don't cover it, so here's my two cents. This is a WIP.

Any arguments regarding cmd.exe's internal operations are moot. The question is, when given inputs, are the outputs expected in accordance with the official documentation, which implies that unless you use the /I switch, the if statement performs a case sensitive compare of strings.

I am not aware of any Microsoft documentation claiming any internal dependency on any specific C library functions, so we cannot refer to any standard library implementations here. That said, it's a good bet that the C strcmp implementation is used, but when I left Microsoft a few years back, some C++ was creeping into that code base, so it's just as likely that they are using the C++ std::string and its operators.

When faced with evidence that there is bug in a widely used tool such as cmd.exe, that nobody noticed in the past, we should first assume the fault is in the evidence. So let's look at that first.

Taking the OP's script output and adding manual check results and ignoring the quotes:

"d" geq "a" 100 geq 97 is true.
"D" geq "a" 68 geq 97 is FALSE!
"d" geq "A" 100 geq 65 is true.
"D" geq "A" 68 geq 65 is true.
"d" lss "j" 100 lss 106 is true.
"D" lss "j" 68 lss 106 is true.
"d" lss "J" 100 lss 74 is FALSE!
"D" lss "J" 68 lss 74 is true.
"d" lss "z" 100 lss 122 is true.
"D" lss "z" 68 lss 122 is FALSE!
"d" lss "Z" 100 lss 122 is true.
"D" lss "Z" 68 lss 122 is true.
"Pink" geq "Blue" 80,105,110,107 geq 66,108,117,101 true 80 > 66.
"pink" geq "Blue" 112,105,110,107 geq 66,108,117,101 true 112 > 66.
"PINK" geq "Blue" 80,73,78,75 geq 66,108,117,101 true 80 > 66.
"Pink" geq "blue" 80,105,110,107 geq 98,108,117,101 FALSE! 80 < 98.
"pink" geq "blue" 112,105,110,107 geq 98,108,117,101 true 112 > 98.
"PINK" geq "blue" 80,73,78,75 geq 98,108,117,101 FALSE! 80 < 98.
"Pink" geq "BLUE" 80,105,110,107 geq 66,76,85,69 true 80 > 66.
"pink" geq "BLUE" 112,105,110,107 geq 66,76,85,69 true 112 > 66. 
"PINK" geq "BLUE" 80,73,78,75 geq 66,76,85,69 true 80 > 66.

So the OP's examples appear to show cases where ASCII/UTF-8 sort order is not applied. Is that script bug or is it a cmd.exe bug?


Here's a more informative script:

@setlocal EnableExtensions

@echo ASCII: z ^> a ^> Z
@call :ReportCaseSensitivity "a" "A"
@call :ReportSortOrder "a" "A"
@call :ReportCaseSensitivity "z" "Z"
@call :ReportSortOrder "z" "Z"

@echo.
@echo ASCII d ^> a ^> D
@call :ReportCaseSensitivity "d" "D"
@call :ReportSortOrder "d" "a"
@call :ReportSortOrder "a" "D"

@exit /b 0

@REM Assumes %1 and %2 are the same letter in lower and uppercase.
:ReportCaseSensitivity
if %1 equ %2 (echo ignored case.) else (echo case senstive.)
if /I %1 equ %2 (echo ignored case.) else (echo case senstive.)
@exit /b 0

@REM Assumes %1 is the greater ASCII value than %2
:ReportSortOrder
if %1 gtr %2 (echo normal sort order.) else (echo reverse sort order.)
@exit /b 0

Yeilds:

>test
ASCII: z > a > Z

>if "a" EQU "A" (echo ignored case. )  else (echo case senstive. )
case senstive.

>if /I "a" EQU "A" (echo ignored case. )  else (echo case senstive. )
ignored case.

>if "a" GTR "A" (echo normal sort order. )  else (echo reverse sort order. )
reverse sort order.

>if "z" EQU "Z" (echo ignored case. )  else (echo case senstive. )
case senstive.

>if /I "z" EQU "Z" (echo ignored case. )  else (echo case senstive. )
ignored case.

>if "z" GTR "Z" (echo normal sort order. )  else (echo reverse sort order. )
reverse sort order.

ASCII d > a > D

>if "d" EQU "D" (echo ignored case. )  else (echo case senstive. )
case senstive.

>if /I "d" EQU "D" (echo ignored case. )  else (echo case senstive. )
ignored case.

>if "d" GTR "a" (echo normal sort order. )  else (echo reverse sort order. )
normal sort order.

>if "a" GTR "D" (echo normal sort order. )  else (echo reverse sort order. )
reverse sort order.

So upper case sorts before lower-case, but the ordering within the case class, seems to be normal, and case sensitivity/non-sensitivity is functioning as expected, in accordance with whether the /I switch is present. The OP's choice of test script and oracle is simply non-conclusive. The issue isn't that it's case insensitive, it's that the order of the case sorting is surprising and non-obvious using the OP's original data.


I chose the set {A, D, Z, a, d, z} because their ASCII/UTF-8 code points have the relation A < D < Z < a < d < z, and I would have at least three convenient points of reference within and between each of the lower and upper case sets. I was too tired to point it out at the time, but cmd.exe seems to use some other criteria, probably cultural.

So my choice of the phrase "reverse sort order" is probably inaccurate. The point is that it's sorting lowercase ahead of upper case, but using normal sort order within each of those two classes.

jwdonahue
  • 6,199
  • 2
  • 21
  • 43
  • I'll edit the OP's post with the above manual results when someone confirms they have checked my work. Then I can delete this or move on to an answer if I get some more time later. – jwdonahue Feb 22 '21 at 20:01
  • Um, but where are `:ReportCaseSensitivity` and `:ReportSortOrder` ? I believe `equ` & `neq` act as expected. The other other operators seem to ignore case regardless. Your test `if "a" GTR "A" (echo normal ... ) else (echo reverse ... )` yielding `reverse` is inconclusive. Is it reversed because in ASCII `a>A` and batch believes `A>a` or is it that the comparison executed is actually `A?>A` or `a?>a` which will also yield false. – Magoo Feb 28 '21 at 06:24
  • I don't know what `A?>A` means. – jwdonahue Mar 01 '21 at 19:32
  • @Magoo, I posted the subroutines. I must have stopped selecting at the first `exit`. My buffer history shows I had a bunch of script at the tail of the file that were probably left over from working a previous problem. Although, I did make sure they were included in the output. – jwdonahue Mar 01 '21 at 19:46
  • @Magoo, and done making edits to the post, for now. – jwdonahue Mar 01 '21 at 20:13
  • `I don't know what A?>A means` - Asking the question `A is it > A` – Magoo Mar 01 '21 at 20:28
  • I think they are the same glyph and the same code point. How does your `?>` operator differ from `>`? – jwdonahue Mar 01 '21 at 20:32
  • Hmm... I am saying that batch is case sensitive, but that it consider the lower case set, to be less than upper case set. As in `z < A` rather than `Z < a`. – jwdonahue Mar 01 '21 at 20:39