1

I'm writing a program in masm assembly to count and return the number of times integers appear in an array. I currently have the following code that allows me to populate an array with random integers. What I am struggling with is how to implement a counter that will store each occurrence of an integer at an index in the array. for instance, if the random array was [3,4,3,3,4,5,7,8], I would want to my count array to hold [3, 2, 1, 1, 1], as there are (three 3's, two 4's, etc).

I have the bounds of the random numbers fixed at 3/8 so I know they will be within this range. My current thinking is to compare each number to 3-8 as it is added, and increment my count array respectively. My main lack of understanding is how I can increment specific indices of the array. This code is how I am producing an array of random integers, with an idea of how I can begin to count integer occurrence, but I don't know if I am going in the right direction. Any advice?

push ebp
mov  ebp, esp
mov  esi, [ebp + 16]    ; @ holds array to store count of integer occurances
mov  edi, [ebp + 12]  ; @ holds array to be populated with random ints
mov  ecx, [ebp + 8]   ; value of request in ecx


MakeArray:              
    mov     eax, UPPER          ; upper boundary for random num in array
    sub     eax, LOWER          ; lower boundary for random num in array
    inc     eax             
    call    RandomRange
    add     eax, LOWER
    cmp     eax, 3          ; Where I start to compare the random numbers added
    je      inc_3           ; current thought is it cmp to each num 3-8
    mov     [edi], eax  ; put random number in array
    add     edi, 4      ; holds address of current element, moves to next element
    loop    fillArrLoop

inc_3:          ; if random num == 3
    inc     esi         ; holds address of count_array, increments count_array[0] to 1?
    mov     [edi], eax  ; put random number in array to be displayed
    add     edi, 4      ; holds address of current element, moves to next element
    loop    MakeArray
  • The usual method is to use the array value as an index into a table of counts. In C that would be `++counts[ arr[i] ]`. No searching needed, works great for small contiguous ranges. – Peter Cordes May 26 '20 at 01:44

2 Answers2

0

My current thinking is to compare each number to 3-8 as it is added

No, you're vastly overcomplicating this. You don't want to linear search for a j (index into the counts) such that arr[i] == j, just use j = arr[i].

The standard way to do a histogram is ++counts[ arr[i] ]. In your case, you know the possible values are 3..8, so you can map an array value to a count bucket with arr[i] - 3, so you'll operate on counts[0..5]. A memory-destination add instruction with a scaled-index addressing mode can do this in one x86 instruction, given the element value in a register.

If the possible values are not contiguous, you'd normally use a hash table to map values to count buckets. You can think about this simple case as allowing a trivial hash function.


Since you're generating the random numbers to fill arr[i] at the same time as histograming, you can combine those two tasks, and instead of subtracting 3 just don't add it yet.

; inputs: unsigned len, int *values, int *counts
; outputs: values[0..len-1] filled with random numbers, counts[] incremented
; clobbers: EAX, ECX, EDX  (not the other registers)
fill_array_and_counts:
    push ebp
    mov  ebp, esp

    push esi        ; Save/restore the caller's ESI.
 ;; Irvine32 functions like RandomRange are special and don't clobber EAX, ECX, or EDX except as return values,
 ;; so we can use EDX and ECX even in a loop that makes a function call.

    mov  edi, [ebp + 16]    ; int *counts  ; assumed already zeroed?
    mov  edx, [ebp + 12]    ; int *values  ; output pointers
    mov  ecx, [ebp + 8]     ; size_t length

    MakeArray:                       ; do{
        mov     eax, UPPER - LOWER + 1   ; size of random range, calculated at assemble time
        call    RandomRange                  ; eax = 0 .. eax-1
        add     dword ptr [edi + eax*4], 1   ; ++counts[ randval ]

        add     eax, LOWER               ; map 0..n to LOWER..UPPER
        mov     [edx], eax               ; *values = randval+3
        add     edx, 4                   ; values++

        dec     ecx
        jnz     MakeArray            ; }while(--ecx);

    pop   edi                  ; restore call-preserved regs
    pop   ebp                  ; including tearing down the stack frame
    ret

If the caller doesn't zero the counts array for you, you should do that yourself, perhaps with rep stosd with EAX=0 as a memset of ECX dword elements, and then reload EDI and ECX from the stack args.

I'm assuming UPPER and LOWER are assemble time constants like UPPER = 8 or LOWER equ 3, since you used all-upper-case names for them, and they're not function args. If that's the case, then there's no need to do the math at runtime, just let the assembler calculate UPPER - LOWER + 1 for you.


I avoided the loop instruction because it's slow, and doesn't do anything you can't do with other simple instructions.

One standard performance trick for histograms with only a few buckets is to have multiple arrays of counts and unroll over them: Methods to vectorise histogram in SIMD?. This hides the latency of store/reload when the same counter needs to be incremented several times in a row. Your random values will generally avoid long runs of the same value, though, so worst-case performance is avoided.

There might be something to gain from AVX2 for large arrays since there are only 6 possible buckets: Micro Optimization of a 4-bucket histogram of a large array or list. (And you could generate random numbers in SIMD vectors with an AVX2 xorshift128+ PRNG if you wanted.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
-1

If your range is fixed (3-8), you have a fixed-length array that can hold your counts:

(index0:Count of 3),(index1:Count of 4)..(index5:Count of 8s)

Once you have an element from the random array, you just take that element and put it through a switch:

cmp 3, [element]
jne compare4
mov ebx, [countsArrayAddress]     ; Can replace [countsArrayAddress] with  [ebp + 16]
add ebx, 0                        ; First index, can comment out this line
mov ecx, [ebx]
add ecx, 1                        ; Increment count
mov [ebx], ecx                    ; Count at the zeroth offset is now incremented
compare4:
cmp 4, [element]
jne compare5
mov ebx, [countsArrayAddress]
add ebx, 4                         ; Second index (1*4)
mov ecx, [ebx]
add ecx, 1
mov [ebx], ecx
...

Is this what you mean? I come from using fasm syntax but it looks pretty similar. The above block is a bit unoptimized, but think this shows how to build the counts array. The array has a fix length, which must be allocated, either on the stack (sub rsp the correct amount) or on the heap, i.e with heapalloc/malloc calls. (Edited, see you're using 32-bit registers)

user176692
  • 780
  • 1
  • 6
  • 21
  • You don't need to linear search, you can just load `[element]` into a register and use it as an index! The OP asked if they're on the right track, and no, their linear search idea is not the right track. – Peter Cordes May 26 '20 at 01:54
  • Yeah, but then isn't your counts array bigger than it needs to be? – user176692 May 26 '20 at 01:56
  • The OP's range of values is known to be 3..8. An extra 3 elements for unused 0..2 is trivial, but in general you can avoid wasting that space: use `add dword ptr [counts + eax*4 - 3*4], 1`, (where `counts` can actually be `ebp-64` or whatever). i.e. in C `++counts[ arr[i] - 3 ]`. Contiguous elements from the input range map 1:1 to contiguous elements in the count array, therefore you can just do it with subtraction, even as part of an addressing mode. – Peter Cordes May 26 '20 at 01:57
  • Yeah, I was thinking that, you could normalize the range, but I was just trying to show how to get a working implementation on putting SOMETHING into the counts array. You can extend this with taking the [element], subtract 3, add this as the offset. That's a step towards making it more elegant, and you can start to reuse the loop. Feel free to post your own answer, don't want to plagiarize you. There are other ways to improve the performance, like retrieving the [element] only once, etc. But these are performance improvements. – user176692 May 26 '20 at 02:05
  • Loading `[element]` only once into a register is also the standard / idiomatic way to write asm. Registers are asm's equivalent of local vars. Lots of things are improvements for performance *and* simplicity, e.g. indexing instead of having an inner search loop. Something that's *only* a perf improvement would be for example having multiple arrays of counts, and unrolling the (outer) loop to distribute counts over those multiple arrays to hide store-forwarding latency from incrementing the same bucket repeatedly, like in [this Q&A](https://stackoverflow.com/q/39266476) – Peter Cordes May 26 '20 at 02:14
  • That point on the dword memory addressing is good, made a fix. Have to restart, mouse out of batteries. Yeah, I agree reusing as much as possible helps in every way, keeping values in registers instead of in memory helps performance too. – user176692 May 26 '20 at 02:17
  • @PeterCordes Hi Peter, I'm still a bit confused as to how I would specifically reference each index in the array. In my example where I compare the random integer to 3 and if it is 3 I inc esi. Is this properly incrementing my array? How would I increment the array at position[1]? – SignificantGuess May 26 '20 at 02:19
  • Or for the OP's very small histogram with only 5 possible values, optimizing with AVX2 can work: I wrote an answer about a 4-bucket equivalent of this [Micro Optimization of a 4-bucket histogram of a large array or list](https://stackoverflow.com/q/61122144). – Peter Cordes May 26 '20 at 02:20
  • Anyway, I'm not really interested in writing an answer about a trivial scalar histogram loop. If you want to give it a shot, go for it. Don't worry about "plagiarism", the basic idea of indexing is well known, and you have my permission to use my suggestions from earlier comments. – Peter Cordes May 26 '20 at 02:21
  • @SignificantGuess When you 'inc' increment, you can increment addresses or values in the address. So you just need a way to handle both, like how you are adding 4 (32-bits) to edi (moving to next array addr) and putting the value in the slot separately. You can find a way to relate the indexes to the values as well, as Peter Cordes mentioned. But if you just inc esi, you can't increase the index and value at the same time. – user176692 May 26 '20 at 02:49
  • @user176692: This was bugging me so I ended up writing an answer after all. Someone had to correct the massive over-complication in the question and your answer, and show a good example of how simple this really is. I didn't find a similar-enough other questions to close this as a duplicate, although I think I have seen this linear-search for `j == element` brainfart before in an SO question. – Peter Cordes May 26 '20 at 04:06