1

I'm working under visual studio 2005 with assembly (I'm a newbie) and I want to create a program that calculate arithmetic progression with that rule: An = 2*An-1 + An-2 but. I really don't know how to work with registers and I need just one example from you to continue with my exercises.

This is my code:

.386
.MODEL flat,stdcall

.STACK 4096

extern ExitProcess@4:Near

.data                               
arraysize DWORD 10

setarray  DWORD 0 DUP(arraysize)
firstvar  DWORD 1
secondvar DWORD 2

.code                               
_main:                              
mov eax,[firstvar]
mov [setarray+0],eax        
mov eax,[secondvar]
mov [setarray+4],eax

mov ecx, arraysize              ;loop definition
mov ax, 8

Lp:
mov eax,[setarray+ax-4]
add eax,[setarray+ax-4]
add eax,[setarray+ax-8]
mov [setarray+ax],eax

add ax,4;
loop Lp

add ax,4;

    push    0                   ;Black box. Always terminate
    call    ExitProcess@4       ;program with this sequence

    end   _main              ;End of program. Label is the entry point.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Elad
  • 479
  • 3
  • 11
  • 21

3 Answers3

1

You can't use ax as index register and eax as data register at the same time. For 32bit code, stick to 32 bit registers, unless you now what you are doing. You inadvertedly used a 16 Bit addressing mode, which you probably didn't want.

mov ecx, arraysize-1              ;loop definition
mov ebx, 8

Lp:
mov eax,[setarray+ebx-4]
add eax,[setarray+ebx-4]
add eax,[setarray+ebx-8]
mov [setarray+ebx],eax

add ebx,4
dec ecx
jnc Lp

Never ever use the loop instruction, even if some modern processors can execute ist fast (most can't).

Gunther Piez
  • 29,760
  • 6
  • 71
  • 103
  • I wouldn't worry about optimizing out the LOOP instruction unless it's time-critical code. It is rather dramatically slower than a DEC/JNZ as far as cycle counting goes, however LOOP is more intuitive when reading the code and it's smaller (which can actually make it faster if it avoids a cache miss). – Brian Knoblauch Nov 23 '10 at 19:55
  • 1
    Well, i would argue, that if you actually do some assembly programming where it is useful, you will always want to avoid loop, because it most likely will be time critical. Furthermore, if you aren't able to parse DEC/JNC sequence immediatly as "this is a loop counter with jump back", you most likely won't be able to do something useful in assembly anyway ;-) So why not doing it the right way from the beginning. By the way DEC/JNZ won't work, because DEC doesn't set the zero flag. – Gunther Piez Nov 24 '10 at 00:10
  • @hirschhornsalz: you had it backwards in this 6-year-old comment: DEC sets all the flags *except* CF. Presumably so you can use it for ADC or SHL loops. Otherwise totally agree, LOOP is just extra complexity, and not worth learning about in the first place for a beginner. It's not "easier", and usually leads to badly-designed loops that waste an extra register, or worse, trying to use it for loops that would be more easily implemented with some other loop condition. – Peter Cordes Oct 27 '16 at 03:59
  • Related: [Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](http://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently). Of modern CPUs, only AMD Bulldozer-family has a fast LOOP. – Peter Cordes Oct 27 '16 at 04:00
0

An = 2*An-1 + An-2 is almost the same formula as the Fibonacci sequence, so you can find a lot of useful stuff by searching for that. (e.g. this q&a). But instead of just an add, we need 2a + b, and x86 can do that in one LEA instruction.

You never need to store your loop variables in memory, that's what registers are for. So instead of each iteration needing to pull data back out of memory (~5 cycles latency for a round trip), it can just use registers (0 cycles extra latency).

Your array can go in .bss, rather than .data, so you aren't storing those zeros in your object file.

arraysize equ 10     ; not DWORD: this is an assemble-time constant, not a value stored in memory

.bss
seq  DWORD 0 DUP(arraysize)   ; I think this is the right MASM syntax?
; NASM equivalent:  seq RESD arraysize

.code
_main:

    mov  edx, 1         ; A[0]
    mov  [seq], edx
    mov  eax, 2         ; A[1]
    mov  [seq+4], eax

    mov  ecx, 8         ; first 8 bytes stored
    ; assume the arraysize is > 2 and even, so no checks here
seqloop:
    lea  edx, [eax*2 + edx]  ;    edx=A[n],   eax=A[n-1]
    mov  [seq + ecx], edx    ; or edx=A[n-1], eax=A[n-2]
    lea  eax, [edx*2 + eax]  
    mov  [seq + ecx + 4], eax
    ; unrolled by two, so we don't need any MOV or XCHG instructions between registers, or any reloading from memory.

    add  ecx, 8             ; +8 bytes
    cmp  ecx, arraysize*4   ; (the *4 happens at assemble time)
    jb   seqloop

    ret

Using the array index for the loop condition means we only used 3 registers total, and still don't need to save/restore any of the usual call-preserved registers ESI, EDI, EBX, or EBP. (And of course the caller's ESP is restored, too).

If you care about performance, the loop is only 6 uops (fused-domain) on Intel SnB-family CPUs. For bigger arraysize, it could run at one result per clock (one iteration per 2 clocks).

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

I'm a beginner in assembler too, but my algorithm is a bit different:


    A   dword   1026 dup (0)          ; declare this in the data segm.

; ...

    mov     esi, offset A         ; point at the results array
    mov     [esi], 1               ; initialize A(0)
    mov     [esi + 4], 2           ;  and A(1)
    xor  ecx, ecx

lp: add esi, 8
mov eax, [esi - 4] ; get A(n-1) add eax, eax ; double it add eax, [esi - 8] ; computes A(n) mov [esi], eax ; and save it inc ecx ; next n cmp ecx, n ; be sure n is a dword, use ; cmp ecx, dword ptr n ; if it isn't jb lp ; loop until ecx < n ; ; now you should have the results in the A array with ; esi pointing to the end of it

I didn't compiled it to see if works well but it should..

BlackBear
  • 22,411
  • 10
  • 48
  • 86