Finding Occurrences in a String in Assembly x86

Question

I have been working on this program where I have to input a string and then display the character distribution in that string.

For example:
if the input is “minecode” the output should be

C – 1
O – 1
D – 1
E – 2
I – 1
M – 1
N – 1

Here is what I tried to do, but I really don't know how to traverse the loop and check for similar character and then increment the count. The assembler is MASM 615 running on a 32 bit machine.

.686
.MODEL flat, stdcall
.STACK
INCLUDE Irvine32.inc
.DATA
  msg0 BYTE "Enter a string of characters: ",0
  msg1 BYTE "Character Distribution: ",0
  MainArray    dword 10000 dup (?)
  UniqueChar   dword 10000 dup (?)
  CharCount    dword 10000 dup (?)
.CODE
MAIN PROC
    lea edx, msg0
    call WriteString
    call dumpregs        ; 1
    call ReadString
    mov MainArray, eax
    call dumpregs        ; 2
    mov MainArray, ebx
    call dumpregs        ; 3
    call CRLF
    lea edx, msg1
    call WriteString
    call CharDist        ; Calls the procedure
    call dumpregs        ; 5
    exit
MAIN ENDP
CharDist PROC
    mov ecx, lengthof MainArray
    mov esi, OFFSET MainArray
    L1:
; what to do here?? 
Loop L1:
CharDist ENDP
END MAIN

`Loop L1:` is a syntax error. You only use the trailing `:` when you define the label, not when you refer to it from elsewhere. (And don't use LOOP in the first place, [it's slow and doesn't do anything you can't do just as easily with more common instructions](http://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently).) — Peter Cordes, Nov 25 '16 at 04:01

Davislor · Answer 1 · 2016-11-27T14:59:24.837

5

One possible approach: make an array of 256 counters, store its base address in ebx, and for each byte in the string, increment the counter at that offset from ebx. Afterward, loop over your array of counters and print the nonzero counts.

You never say whether this is a string terminated by a 0 byte (C-style), one preceded by its length (Pascal-style), or one whose length is passed in as a second parameter, but that will determine when you terminate the loop. If you’re looking for a terminating zero, test the byte you just read, and if you’re counting a specific number of bytes, keep the count of bytes remaining in ecx and test that. (There are special instructions to branch conditionally if ecx is not zero, if you feel like using them.)

If you keep your pointer to the string in esi, you can load the next byte into al with the lodsb instruction. Alternatively, you can mov from [esi] and then inc esi. If you zero out eax before storing each byte in al, this will give you an index in eax that you can use with an array of counters.

edited Nov 27 '16 at 14:59

answered Nov 25 '16 at 03:48

Davislor

14,674
2
34
49

1

Yup, only 256 bins is small enough to make a standard histogram approach work well with a plain array, not a hash table or other map data structure. – Peter Cordes Nov 25 '16 at 03:51
5

However, your suggestion to use the LOOP instruction to implement the loop is not good. On most CPUs, [it's so slow you might as well forget it even exists](http://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently), unless optimizing for code-size. The other thing I'd suggest is using `movzx eax, byte [esi]` / `inc [table + eax]` inside the loop. That's maybe easier to thing about than zeroing eax outside the loop and then just modifying the low byte inside the loop (e.g. with LODSB). Both ways work, though. – Peter Cordes Nov 25 '16 at 03:53
1

I would definitely advise the OP to get the simple approach working before trying to implement a hash table in assembly language. – Davislor Nov 25 '16 at 03:54
1

My point was that a simple array *is the right solution* for this specific problem size (which is why a beginner was asigned it for homework, I assume :), but might not be for related problems with sparse sets. Using a hash table for this would be slower. If you were going to do anything, you could assume printable ASCII and do `inc [table + eax - 0x20]` or something, but probably best not to. Allocating memory that's never touched is not really a problem as far as performance. – Peter Cordes Nov 25 '16 at 03:57
1

@PeterCordes I would recommend someone with a question like this optimize for simplicity and clarity, but I’ve removed that bit of advice. – Davislor Nov 25 '16 at 04:00
Yeah, agreed. IMO, not using a special instruction just for looping *is* simpler in some ways. Or just always write `dec ecx/jnz` for identical behaviour (other than flags) if you want a cookbook recipe for looping. I think it might help to get students to understand loops better if they don't think they're somehow "magical". i.e. you just loop with normal conditional-branch instructions, and should pick whatever termination condition you can test efficiently. – Peter Cordes Nov 25 '16 at 04:04
2

Who's upvoting both our comments? Is someone lurking here hanging on our every word? – Peter Cordes Nov 25 '16 at 04:05
@PeterCordes Yes, I agree. You would want a hash table or sparse data structure if you had to cover all the range of Unicode, but it would be overkill here. 256 32-bit counters only cost 1,024 bytes of memory. – Davislor Nov 25 '16 at 04:10
Yup, for ASCII (or arbitrary bytes), a hash table would be *worse* than an array. Even with 64-bit counters in long-mode (or using SSE2 MOVQ / [PADDQ](http://www.felixcloutier.com/x86/PADDQ.html) / MOVQ in 32-bit mode), it still easily fits in L1 cache on every modern x86. And the distribution of characters is pretty dense, so the upper half of the table probably doesn't get any hits (until you scan it looking for non-zero counts, but that's not worth making the loop slower :/) (Started writing this comment before your last edit to previous comment. Wouldn't have bothered otherwise.) – Peter Cordes Nov 25 '16 at 04:17
"*for each byte in the string, increment the counter at that offset from `ebx`*" I fear a newbie wouldn't understand this sentence. What is "offset from `ebx`"? – Sep Roland Nov 27 '16 at 14:05

Ped7g · Answer 2 · 2016-11-25T12:30:46.603

Other possible approach:

Should be easier to understand and actually it is very "human" straightforward (like how would you do it on paper with pencil), so I wonder how you didn't come up with this one ... you should not only try to understand how this works, but also try to figure out why you didn't see it, what is confusing/blocking you.

for all letters of alphabet [A-Z] as "letter" {
    counter = 0;
    for all characters in input string as "char" {
        if (letter ignore_case_equals char) ++counter;
    }
    if (0 < counter) {
        display "letter - counter" and new line.
    }
}

This may be actually faster for English alphabet and short string, like 3-5 letters (containing letters only); than the suggested count sort, as alphabet is 26 chars and count table is 256 bytes, so 256/26 = ~9. But count table will reveal count for any character, including non-alphabet ones and also it will stall less on branching.

Also notice, how you did start with emitting code for prompts/input/etc... things you already could do in Assembly, avoiding the unknown.

I did start with almost-English description of algorithm. Because I don't care, what I know to write in Assembly (I already know, that pretty much anything :) ), first I want to be sure I know what I want the code do for me and what kind of data I want to process. Then I will command the CPU to do it, by finalizing the data structures plan and splitting those English notes into simpler and simpler steps, until they start to resemble instructions, then I fill up the instructions between comments.

Don't skip the native language reasoning phase, it may save you lot of work on the code, if you figure out some elegant way to organize data or cut down algorithm steps by reusing certain part of it. Not every problem has short elegant solution, but many have.

Depending on how you implement the count-matches part, that part can be branchless (CMP / `sete al` / `add ecx, eax`). The `0 < count` branch for conditional printing is also present in the histogram version. And yes, for short strings, especially with the character limited to just alphabetic characters, this could be faster. — Peter Cordes, Nov 25 '16 at 12:30
If you could guarantee that there would only be letters, you could optimize the array of buckets to be faster, too! But this also works, and it’s good to try more than one approach. — Davislor, Nov 27 '16 at 15:06

Finding Occurrences in a String in Assembly x86

2 Answers2