1

I was wondering why you would use the and instruction instead of the sub instruction when converting lowercase ASCII characters to uppercase ones.

mov dx, 'a'
sub dx, 32

vs

mov dx, 'a'
and dx, 11011111b
Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Markian
  • 322
  • 1
  • 12
  • 4
    It doesn't really matter if you already established that input is lower case so you know bit #5 is set. You can clear it by subtracting or masking as you like. But if the input can be upper case already, the `and` would leave it unchanged while the `sub` would ruin it. – Jester Jan 12 '23 at 16:05
  • `xor 0100000b` also works. – Erik Eidt Jan 12 '23 at 16:06
  • Thank you for the help. The input is established to be lowercase. – Markian Jan 12 '23 at 16:07
  • Those aren't ASCII single-quotes (or double quotes or backticks); NASM won't assemble that source. Use `mov dx, 'a'` – Peter Cordes Jan 13 '23 at 02:27

2 Answers2

3

There's no performance or correctness difference if you already know the input is a lower-case alphabetic character. and has the advantage when you know it's alphabetic but it might already be upper-case, since it leaves upper-case letters unmodified. (Or as part of detecting alphabetic and normalizing to one case, either with and with ~0x20 or or with 0x20, as in What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?)


If the next instruction is a jcc like jnz, sub and and are equally able to macro-fuse with it into a single uop on Intel Sandybridge-family CPUs, so no advantage there.

If using it in a loop over a zero-terminated C string, you might be doing something like movzx edx, byte [rdi] / and edx, ~0x20 / jnz .loop at the bottom of a loop, since all alphabetic characters have non-zero bits other than the lower-case bit. (0x20 is ASCII space).

Using sub in that case lets you exit a loop on any character less than space, i.e. control characters, tabs, or newline. sub edx, 0x20 / ja .loop, or jae .loop to keep looping even on a space (but still not tab or newline).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

Either one is acceptable, it's just a matter of preference. I like to use and myself. Shouldn't matter as long as you've checked to make sure your character is between 'a' and 'z' first.

puppydrum64
  • 1,598
  • 2
  • 15