nested loops
keep in mind the whole "loop" is logical concept of programmer, not a feature of CPU (well, x86 does have instruction literally named loop
, but rather do NOT use it: Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?).
What makes a loop loop? You have some code which is considered "body" of loop, and you run it N-many times, that's "loop".
Now nested loops is calling a situation, where the body of loop (outer loop) contains another loop (inner loop), which will execute for example M-many times. Then the inner-loop body will execute NxM-many times in total, because each time the outer loop loops, the inner loop will run again.
Simple x86 example:
; init outer loop counter (will loop till zero)
mov ebp,7
outer_loop_body:
; some outer loop body code
; init inner loop counter (will loop till zero)
mov ecx,5
inner_loop_body:
; some inner (nested) loop body code
; this inner body will execute 7x5 = 35 many times
dec ecx
jnz inner_loop_body ; repeat until counter is zero
; some other outer loop code
dec ebp
jnz outer_loop_body ; repeat until counter is zero
; this code is outside of loops, executed after them
You must of course preserve your counters (if you run short on registers, use some memory to store them, if you don't use them too much).
In your task it is said "indexed addressing", and the countdown counters in example goes through values N to 1, so they are almost usable as index (just needs -1), if you will start from the end of source
array. But that is in favour of "forward one position" data movement with in-place memory overwrite, for backward move it would be better to construct loop which has up-count counter like:
; init loop index to go from zero to N-1
xor ecx,ecx ; index = 0
loop_body:
; some loop body code, using ECX as index
; advance to next index
inc ecx
cmp ecx,N
; loop until value N in index is reached
jb loop_body
; this code is outside of loop, executed after it
I also cannot figure out how to move the letters.
I would almost pay you to see what your brain tried, makes me really curious how you think about this kind of problems, for a seasoned programmer this is obvious, but looks like it is not.
I personally start with imagining the data (ignoring code and algorithm), in your case:
source BYTE "S", "m", "i", "t", "h"
This will assemble as 5 consecutive bytes in memory with values S, m, i, t, h (for particular numeric value check the ASCII table, or disassembly).
Now I imagine how the memory should look after first iteration of forward move (and I will do forward move):
source BYTE "h", "S", "m", "i", "t"
The "h" going off the original array is placed on the first position to make the whole content go round.
And now I imagine some algorithm to achieve this transformation of data (i.e. calculating the desired result from the input data):
fetch last element and keep it safe for later
loop (from last position down to second position) do:
read position-1 element, and store it at position
store the original "last element" to first position
I run it quickly in my head to verify it does the calculation I want, fix any errors. I make some wild quick guess, how many instructions are needed to implement particular step, if it goes above cca 5, I break it down somewhat more to simpler steps. In this case this feels like each step is about 2-3 instructions (which usually means about x2 in the end), so this breakdown of algorithm would be ok for me.
Then I use that as base comments, and implement it with asm instructions.
Seems so obvious to me, that I can't imagine what made you stuck.
And finally example how to overwrite some memory of "array of bytes" in indexed way:
; assume ESI already contains address of array
; initialized earlier by something like: mov esi, OFFSET source
; and ECX contains index (0..4 value)
mov al,[esi+ecx-1] ; loads element from position ECX-1
; ^^ will go out of bounds for index 0!
mov [esi+ecx],al ; store that element to position ECX
Or you can use another MASM syntax option, this time specifying the array address directly as compile time constant:
; ECX contains index (0..4 value)
mov al,source[ecx-1] ; loads element from position ECX-1
; ^^ will go out of bounds for index 0
mov source[ecx],al ; store that element to position ECX
The first option with base address in register gives you option to re-use the same functionality over any memory array, not just the source
, so I prefer it over the second variant, but otherwise they are identical in functionality.
So in short, "to move the letters" you simply overwrite the original memory content with the new content, which happens to be the original values, but stored to different addresses (by +-1, depending how you understand "forward" wording in question, although I believe that example is simply wrong, and "forward" is the other way around).
If you do that in-place in the same memory, notice you have to pick correct order of overwriting (either from last to first position for "forward", or form first to last position for "backward"), otherwise you will copy one element value into the remaining positions.
Now I would appreciate to hear what you were unable to figure out on this, and if it is clear now.
edit: one more note...
Of course to implement any calculation with asm instructions you need to have first some idea, what instructions are available, and what is their functionality. It helps to read through instruction reference guide few times to get a feel for what kind of calculation is possible.
With x86 I would start with the 80386 ISA, as that one is reasonably short (not including FP instructions and SIMD extensions), but already contains all common basic x86 instructions, for example google found some original PDF from Intel http://css.csail.mit.edu/6.858/2013/readings/i386.pdf, chapters 3, 4, 5 and 17 relevant (or web version http://css.csail.mit.edu/6.858/2014/readings/i386/c17.htm ).
And make sure you understand well what are CPU registers, how many of them there are, what are their "sizes" in bits, and what does that mean for min/max values, and what is computer memory and how does it operate. I.e. the basic things about x86 computer architecture.
Then the step "implement that comment with instructions" should be manageable (with growing experience even not that hard, as at first).