0

I'm having trouble with figuring out how to determine if a value is a number or letter in MASM assembly language. This program should go through and array and display the first number found in an array and print it along with the index it was found at. I'm using the Irvine32.inc library which contains IsDigit but for some reason it isn't working and I don't know why.

Here's the code:

TITLE Number Finder

INCLUDE Irvine32.inc

.data
AlphaNumeric SDWORD 'A', 'p', 'Q', 'M', 67d, -3d, 74d, 'G', 'W', 92d
Alphabetical DWORD 'A', 'B', 'C', 'D', 'E'
Numeric      DWORD  0, 1, 2, 3, 4, 5, 6
index        DWORD  ?
valueFound   BYTE "number found: ", 0
atIndex      BYTE "at index: ", 0
noValueFound BYTE "no numeric found", 0
spacing      BYTE ", ", 0

;DOESNT WORK CORRECTLY
;SKIPS the value 67

.code
main PROC
mov esi, OFFSET AlphaNumeric    ;point to start of array
mov ecx, LENGTHOF AlphaNumeric  ;set loop counter
mov index, 0

mov eax, 0                      ; clear eax

L1: mov al, [esi]
    call IsDigit                    ; ZF = 1 -> valid digit , ZF = 0 -> not a valid digit

;jmp if digit
jz NUMBER_FOUND 

;jmp if char
jnz CHARACTER

;this probably never gets reached
inc index
add esi, TYPE AlphaNumeric
loop L1

;if loop finishes without finding a number
jmp NUMBER_NOT_FOUND

;next iteration of loop if val is a char
CHARACTER:
add esi, TYPE AlphaNumeric
add index, 1
loop L1

NUMBER_FOUND:
mov edx, OFFSET valueFound
call WriteString                ; prints "number found"
mov eax, [esi]
call WriteInt                   ; prints the number found
mov edx, OFFSET spacing
call WriteString
mov edx, OFFSET atIndex
call WriteString                ; prints "at index: "
mov eax, index
call WriteDec                   ; prints the index value

;jmp to NEXT to skip NUMBER_NOT_FOUND block
jmp NEXT

NUMBER_NOT_FOUND:
mov edx, OFFSET noValueFound
call WriteString

NEXT:

exit
main ENDP
END main

When I debug it, when it gets the the loop iteration where it processes the value 67d it load 43 into al which is its hex representation but since 43h lines up with the ASCII value 'C' is assuming that call IsDigit processes this as a letter and not a number. It also skips all numbers and will print "Number found: +65, at index: 10" which shouldn't even happen. Is there an operation I can use to convert the hex value to the decimal value for the IsDigit call to work correctly? So if someone could please explain a way to evaluate if a value in an array is either a number or letter, capital and lowercase, that would be very much appreciated.

JonGrimes20
  • 115
  • 9
  • 2
    `67` (decimal) is the same byte value as `'C'`. Once it's assembled into binary, there's no way you can tell how it was written in the source; `db 67, 'C'` is the same pair of bytes as `db 'C', 67`. It's a number that's in the range of upper-case ASCII codes. Bytes don't have types associated with them, just values. If your program doesn't keep track of types separately, that info is not recoverable. – Peter Cordes Mar 22 '22 at 19:38
  • @PeterCordes so this problem is essentially impossible? – JonGrimes20 Mar 22 '22 at 19:40
  • 1
    Yes, that's correct. Unless you change the goal to be simply checking for alphabetic ASCII codes, in which case `67` is alphabetic. https://asciitable.com/ – Peter Cordes Mar 22 '22 at 19:42
  • @PeterCordes If it was checking for acii alphabetic codes wouldn't it just print 'C' and not '67'? if that's not the case could you show me how to do this problem with ascii values? – JonGrimes20 Mar 22 '22 at 19:47
  • It's up to you whether you choose to print output as a string of ASCII digits that represent the ASCII code numerically (convert manually or call `WriteInt` after sign-extending it into EAX), or whether you output the ASCII byte directly as a character to stdout (`call WriteChar`) – Peter Cordes Mar 22 '22 at 19:50
  • @PeterCordes A.) so if I call ```WriteInt``` after signextending it into EAX will it print the numeric value and not the character value? and B.) how would I compare the value in [esi] to determine if is a char or number if I choose to go that route? – JonGrimes20 Mar 22 '22 at 19:54

1 Answers1

1

This is an impossible task. The most you can do is check for numbers that aren't the ASCII code for an alphabetic character (https://asciitable.com/), which is what your code does. Index 5 is the first byte where that's the case.

67 (decimal) is the same byte value as 'C'. Once it's assembled into binary bytes in your .data section, they're the same single byte. Thus there's no way you can tell how it was written in the source; db 67, 'C' is the same pair of bytes as db 'C', 67. It's a number that's in the range of upper-case ASCII codes. Another equivalent way to write the same value in the source is 43h.

Bytes don't have types associated with them, just the 8-bit bit-pattern which represents a value. Different interpretations of the same bits could be different values, e.g. -3 (signed) and 253 (unsigned) are both represented by the bit-pattern 0b11111101 which is 0xfd. All of those are valid ways of writing the value that gets loaded into AL by your program. Numbers in a computer are binary; hex and decimal are just convenient formats for humans, so debuggers convert binary values into strings of ASCII digits for display.
As a character value, it also represents a font glyph in some 8-bit character sets.

If your program doesn't keep track of types separately, that info is not recoverable.

Normally you write programs to know that a whole array holds 8-bit numbers, or holds ASCII codes, just like in C you have functions that take int8_t* or char*, even though those are the same actual type, they have different semantic meaning for human programmers. Or another example would be int* vs. char*; you certainly could look at the bytes of an int array as character data (with many of the characters being '\0' or '\xff' for small positive / negative integer values), but you don't try to figure it out by looking at the byte values. Higher-level languages like Python and Perl store a type along with each object, like a struct { enum type; union { stuff }; }, with many types like a string including a pointer.


Re: implementing an IsAlpha function: See What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? - it only takes a few instructions.

;; input in DL, unmodified
IsAlpha:
    mov     eax, edx
    or      al, 0x20  ; force to lower case if it wasn't already
    sub     al, 'a'
    cmp     al, 25    ; 'z'-'a' = index of the last letter in the alphabet
      ; setbe al      ; for a boolean 0/1 return value in AL
    ret
;; return in FLAGS: ja non_alpha    or   jbe  alphabetic
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I implemented the ```or al, 0x20``` through ```cmp al, 25``` line inside of L1 but it still skipped 67 but printed "number found: -3, at index: 5' which, although correct, ins't the first number in the AlphaNumeric array. Any idea why this is happening? – JonGrimes20 Mar 22 '22 at 20:19
  • Was I supposed to change how my AlphaNumeric array is defined in the data segment? – JonGrimes20 Mar 22 '22 at 20:20
  • 2
    @JonGrimes20: Maybe you didn't understand the point of this answer, including the very first line that says "this is an impossible task". Index #5 is the first number that's not the ASCII code for an alphabetic character. That's all you can check for, not how it was written in the source. There's no way you can change the array itself to distinguish this. (Other than making it an array of structs with a boolean ischar flag stored next to each value.) – Peter Cordes Mar 22 '22 at 20:29
  • oh okay that makes sense, thank you very much – JonGrimes20 Mar 22 '22 at 20:30