Assembly program that finds second largest value in array

Question

I wrote a assembly program that finds the max value in an array, but now I would like it to find the second largest number in the array. How would I modify my program to do this?

This is the program I wrote and it does work. The program prints all the values in the array and then finds the max value of the array. Now I want it to find the second largest value.

 %include "io.mac"
.STACK 100H 

.DATA
   Numbers DW 3,4,5,2,6,0
   msg0  db "Printing values in array",0
   msg1  db "Max",0
   msg2  db "Min",0

   .CODE
        .STARTUP
    PutStr msg0
    mov dx, Numbers 
    mov si, dx ;point to array
    printValues:
    cmp word [si], 0
    je procedure
    nwln
    PutInt [si]
    add si, 2
    jmp printValues

    procedure:
    push Numbers ;push Number to stack to pass parameter by stack
    call maxMeth
    nwln
    PutStr msg1
    nwln

    PutInt ax
    nwln




    complete:
.EXIT





maxMeth:
    enter 0,0 ;save old bp and set bp to sp
    mov si, [bp+4] ;point to array 
    mov ax, [si]   ; ax holds max
    add si,2 ; Increment si to next number

;Now entering loop
max:   
    cmp word [si],0   ; checks to see if the number is 0 and if it is, then we are done.
    je finish
    cmp ax, [si]        ; ax holds the max . So if ax is greater than si, then dont assign si to ax. 
    jge increment
    jmp newMax
newMax: 
    mov ax, [si] ; Otherwise we have a new a max

increment:   
    add si, 2   ;now increment si to check the next value and jump back to the main loop.
    jmp max

finish: ;clean up.
    leave ;pop bp
    ret   ;return

I edited my program to keep track of the second max and the result I receive for the second max value is 3. I was expecting the program to output 5. Im not sure why I am getting the wrong output. Here are my edits to the program.

%include "io.mac"
.STACK 100H 

.DATA
   Numbers DW 3,4,5,2,6,0
   msg0  db "Printing values in array",0
   msg1  db "Max",0


   .CODE
        .STARTUP
    PutStr msg0
    mov dx, Numbers 
    mov si, dx ;point to array
    printValues:
    cmp word [si], 0
    je procedure
    nwln
    PutInt [si]
    add si, 2
    jmp printValues

    procedure:
    push Numbers ;push Number to stack to pass parameter by stack
    call maxMeth
    nwln
    PutStr msg1
    nwln

    PutInt ax
    nwln

   PutInt bx
    nwln


    complete:
.EXIT





maxMeth:
    enter 0,0 ;save old bp and set bp to sp
    mov si, [bp+4] ;point to array 
    mov ax, [si]   ; ax holds max
    mov bx, [si]   ; ax holds max
    add si,2 ; Increment si to next number

;Now entering loop
max:   
    cmp word [si],0   ; checks to see if the number is 0 and if it is, then we are done.
    je finish          ;equals 0, then finished
    cmp ax, [si]         
    jg testSecondMax   ;So if ax is greater than si, then dont assign si to ax and check second max
    jmp newMax         ;otherwise go to new max


newMax: 
    mov ax, [si] ; save new max
    jmp increment ; and increment

testSecondMax:
    cmp bx, [si]
    jl secondMax
    jg increment

secondMax:
    mov bx, [si]
    jmp increment

increment:  
    add si, 2   ;now increment si to check the next value and jump back to the main loop.
    jmp max 



finish: ;clean up.
    leave ;pop bp
    ret   ;return

The usual way is to keep track of the max and 2nd-max as you step through the array. Compare against 2nd-max, and if greater, then insert into what is effectively a 2-entry sorted list (which you should keep in registers). — Peter Cordes, Nov 24 '17 at 03:58
run your "find the largest" twice: run once, zero out all matches from the list, then run again — Tommylee2k, Nov 24 '17 at 08:33

Peter Cordes · Accepted Answer · 2017-11-24T06:53:31.293

I ended up writing code for this because it made me wonder how efficiently I could implement the loop. If you want to solve your homework yourself, don't look at my code, just the bullet points in English. Use a debugger to single-step through your code.

Here's how I would change your code:

NASM style: use local labels (like .noswap:) within functions. Indent operands to a consistent column so it doesn't look ragged. Comment your function with inputs / return value and calling convention (what registers it clobbers).
optimize out the jmp next_instruction before newMax: because it's just an expensive no-op to jump where execution was going to fall through anyway.
Unless maybe optimizing for a real 8086, don't use enter, it's slow.
Load each element being checking into a register, instead of using the same memory operand multiple times. (x86-16 has 6 integer registers other than BP/SP; use them.)
put the loop-exit conditional branch at the bottom. (Jump to there for the loop entry point if needed).
Keep a max and 2nd-max in two registers, like you're keeping max in AX.
When you find an element greater than 2nd-max, keep the highest 2 of the 3 numbers you have. i.e. maintain a 2-element queue / sorted list in 2 registers.

Untested:

; word max2Meth(word *array);
; Input: implicit-length array (terminated by a 0 element),
;        pointed to by pointer passed on the stack.  (DS segment)
; returns in ax
; clobbers: cx, dx
global max2Meth
max2Meth:
    push  bp
    mov   bp, sp     ; make a stack frame.  (ENTER is slow, don't use it)
    push  si         ; SI is call-preserved in many calling conventions.  Omit this if you want to just clobber it.

    mov   si, [bp+4] ; pointer to array 

    mov   ax, [si]   ; assume that the list is non-empty
    mov   dx, ax     ; start with max = max2 instead of putting a conditional xchg outside the loop

    jmp   .loop_entry   ; enter at the bottom, at the conditional branch
;;; ax: 2nd max
;;; dx: max

.maxloop:              ; do {
    cmp cx, ax         ; check against 2nd-max, because the common case is less than both.
    jg  .updateMaxes   ; optimize for the common case: fall through on not-found

.loop_entry:
    add  si, 2
    mov  cx, [si]      ;   c = *p++;
    test cx, cx
    jnz .maxloop       ; } while (c != 0);

.finish:
   ; 2nd-max is already in AX, just clean up and return

    pop  si
    leave    ;pop bp would be faster because SP is already pointing to the right place
    ret

; This block is part of the function, even though it's placed after the ret
.updateMaxes:
    mov  ax, cx           ; Bump the old 2nd-max unconditionally
    cmp  ax, dx
    jle  .loop_entry      ; if 2nd_max <= max, return to the loop
    xchg ax, dx           ; otherwise swap
    jmp  .loop_entry

Putting the block for a rare case out-of-line is nice, because it means the common case can just fall through with no taken branches. Often putting if/else conditions inline require a jmp somewhere just to avoid running the else part after the if. But .updateMaxes: ends up being rather elegant, IMO: it has to jump back into the loop, and we can do that either before or after swapping.

16-bit xchg is 3 uops (as expensive as 3 mov instructions), but to make that cheaper (and only do mov ax, dx / mov dx, cx) in the new-max case we'd have to make the new-2ndmax case slower. Assuming it's more likely to just update the 2nd max than to update both, I think just mov and cmp/jcc is a win there. You could make that part branchless with cmov (on a 686 CPU), which might be good. Making the whole thing branchless would give you quite a long dependency chain, and not be worth it unless the array elements were on average getting larger towards the end so you had frequent max-updates all the time (but not with a pattern, so you get branch misses).

On Intel Haswell/Skylake, the inner loop is only 4 fused-domain uops (both compare/branches can macro-fuse). Over long runs of no updates, it should run at 1 iteration per clock.

If you're optimizing for code-size over speed (e.g. for a real 8086), use ax as the temporary and lodsw instead of mov ax, [si] and add si, 2. (Keep your max in a different register).

With an implicit-length list, you can't just use scasw with 2nd-max in ax, because you need to check for 0 as well as > 2nd-max :/

As a further optimization, you could use bp instead of si, if you're using the tiny model (SS = DS), because you don't need to access the stack after loading the pointer. You can just pop bp instead of leave

Before I thought of simply entering the loop with ax = dx = first element, I was going to use this code before the loop:

    mov   ax, [si]   ; assume that the list is non-empty
    mov   dx, [si+2] ; but check if it's only 1 element long, like maxMeth does
    test  dx, dx
    jz    .finish

    add   si, 4      ; first 2 elements are loaded

    cmp   ax, dx     ; sort them so ax <= dx
    jng  .noswap
    xchg  ax, dx
 .noswap:

Another way to structure the inner loop would be like this:

.maxloop:              ; do {
    add  si, 2
    mov  cx, [si]      ;   c = *p++;

    test cx, cx
    jz .finish         ; jz instead of jnz: exit the loop on the sentinel value

    cmp cx, ax         ; check against 2nd-max, because the common case is less than both.
    jng .maxloop

;; .updateMaxes:  ;; conditionally fall through into this part 
    mov  ax, cx           ; Bump the old 2nd-max unconditionally
    cmp  ax, dx
    jle  .maxloop         ; if 2nd_max <= max, return to the loop
    xchg ax, dx           ; otherwise swap
    jmp  .maxloop

.finish:

This is probably better because we can just fall into the top of the loop to start. We get out of the loop with a jz that skips over the conditional-update stuff, so we don't have any branches that exist only to skip over code that's "in the way". (i.e. we've succeeded at laying out our code blocks efficiently.)

The only downside for some CPUs is that the test/jz and cmp/jg are back to back. Some CPUs do better when conditional branches are separated by a couple more instructions than that. (e.g. unless you get lucky on how these hit the decoders on Sandybridge, one of the two branches won't macro-fuse. But they would with the first loop.)

Reminder: Stack Overflow user contributions are licensed under cc by-sa 3.0 with attribution required, so if you copy-paste my whole code, make sure you include https://stackoverflow.com/questions/47466116/assembly-program-that-finds-second-largest-value-in-array/47467682#47467682 in a comment.

I'd like to know why this answer was downvoted. I think it clearly explains how to solve this simple problem with a good algorithm, and some notes on implementing it efficiently. — Peter Cordes, Nov 24 '17 at 06:57
Who knows, maybe someone was choking on a turkey leg and hit the wrong button. Excellent answer as usual. When something like that happens, you just have to chock it up to the phase of the moon... — David C. Rankin, Nov 24 '17 at 07:13
@DavidC.Rankin: I assume it's the same person that downvoted the question. Maybe they just don't think anyone should be involved with answering a question like this? I had another downvote on an incomplete answer (pointing out some but I assume not all of the bugs) to another debugging question a couple hours ago, otherwise I wouldn't have commented about the DV here. — Peter Cordes, Nov 24 '17 at 07:18
It is just frustrating when people don't have the integrity to stand behind their downvote. (I certainly do not see any competing answer that could be judged superior...) — David C. Rankin, Nov 24 '17 at 07:22
With a bit of unroll you could approach the limit of checking 2 values per cycle for branch based approaches (limited by 2 branches per cycle) which would be nearly twice as fast for the case that updates are infrequent. Beyond that I can only think of approaches that work for some types of distributions: if the long-run second largest value is usually at least twice as big as a typical value (true for most distributions, uniform being a notable exception) you could `or cx, [si + ...]` N values together from memory and then check to see if that was larger than the current 2nd largest. — BeeOnRope, Nov 24 '17 at 20:20
That's an optimistic check: if it returns "true" (new value is larger) you need to do a second more careful check without the `or` - but for many distributions this will be almost as predicable (as "false") as the accurate check. Oh wait, you are still hitting the limit of 2-loads/cycle. Well I guess that's the max then (in 32 or 64-bit code you could try to load multiple values at once, but why not just use SIMD at that point). — BeeOnRope, Nov 24 '17 at 20:22
@BeeOnRope: `or` isn't safe here: these are *signed* integers. You could miss a large positive if ORing in a negative number made the value negative. (IDK why `0` is the sentinel for signed integers, but the OP's code is definitely using signed comparisons). Also, it's implicit-length so you need to check each value for `0` before loading into the next cache line at least. Even for unsigned implicit-length, without being able to use memory-source `or` I don't think we come out ahead (assuming Haswell's ability to make up-to-two macro-fusions per decode block). — Peter Cordes, Nov 24 '17 at 22:47
Oops, I didn't notice it was a sentinel based loop or that it was signed. You could solve the signedness problem if you had to _support_ negative values but they weren't likely to be encountered in practice by just doing an unsigned comparison for the optimistic check on the or-accumulated value. Any negative value would cause it to fall into the slow path, but there you do signed comparisons and so it stays safe. Sentinel kills it though. — BeeOnRope, Nov 24 '17 at 23:32
@BeeOnRope: Yeah, good point with unsigned. Only SIMD can solve the sentinel problem, though. (Or SWAR bithacks for detecting a zero-word, I guess, maybe worth doing if you had 64-bit registers but no SIMD, like on some RISC CPUs.) — Peter Cordes, Nov 24 '17 at 23:40
Yeah, SWAR would work here, just like `strlen` but also then of course you have to jump through all the alignment hoops :). — BeeOnRope, Nov 24 '17 at 23:42
An approach I've used in the past to solve the "what is the distribution???" question is to dynamically select different loops at runtime. You can basically do it "for free" at the assembly level (and to a uglier, extent in C or C++ with `goto`) by kind of implementing an implicit state machine that selects between different loops. For example, you have the loops you wrote for the normal case where "found bigger" is unusual, but then when you fall into the `updateMaxes` you don't jump back to that loop but say one unrolled iteration of the loop, and if you find _again_... — BeeOnRope, Nov 24 '17 at 23:45
... that the next value is larger, you instead jump to a loop which now assumes that you have an ascending-sorted array and optimized for that case (note this case is really fast since you eliminate the sentinel check since at least if "current 2nd largest > 0" since you only need one check for both conditions. When this fails one or two times you go back to the other loop. Essentially you have an adaptive algorithm, but without paying the price of "stats tracking" like you'd normally have since you are encoding in the IP the state (only at some code in code-size). — BeeOnRope, Nov 24 '17 at 23:49
You can implement whatever even more complex transition logic "for free" by duplicating loops if you want more complex logic like "Even when I see 2 consecutive larger values, the 3rd is usually smaller so stick with the default loop". So you can have two versions of the default loop, which have identical instructions, but one of which means "i haven't made a decision on the transition logic" and the other means "I want to use behavior X when I see a higher value" and so on. — BeeOnRope, Nov 24 '17 at 23:51
@BeeOnRope: oh good point that sentinel-check elimination only works if 2nd-largest > 0. I'd thought about doing that in my answer. In some previous SO answer, I think I came up with the adaptive idea for branchless vs. branchy on a similar kind of distribution-dependent problem. — Peter Cordes, Nov 24 '17 at 23:52
The idea is also useful to pick between a branchless solution (good if unpredictable) and a branchy one (good if predictable): you unroll the branchy one a bit so that if the branches are are going both ways it switches to branchless (although you this isn't an exact proxy for "unpredictable" since the pattern might be predictable). The switch back to the branchy solution from the branchless once isn't as obviously free since you don't have branches to unroll and hide your transition in, but often something is implicitly tracked which you can check periodically as a proxy for predictability. — BeeOnRope, Nov 24 '17 at 23:58

Assembly program that finds second largest value in array

1 Answers1

Linked