Multiplication of corresponding values in an array

Question

I want to write an x86 program that multiplies corresponding elements of 2 arrays (array1[0]*array2[0] and so on till 5 elements) and stores the results in a third array. I don't even know where to start. Any help is greatly appreciated.

Some questions: What's your background? Have you done an Assembly before? Have you done C? Are you sure you want an 8086 program rather than an x86 or x86-64 program? Are you familiar with Linux? — 0x777C, May 18 '19 at 20:48
I am a sophomore in college who has done C before but only know basic assembly language commands. Yes it is an x86 program. — Bunty, May 18 '19 at 20:56

score 5 · Answer 1 · edited May 19 '19 at 20:14

5

First thing you'll want to get is an assembler, I'm personally a big fan of NASM in my opinion it has a very clean and concise syntax, it's also what I started on so that's what I'll use for this answer. Other than NASM you have:

GAS

This is the GNU assembler, unlike NASM there are versions for many architectures so the directives and way of working will be about the same other than the instructions if you switch architectures. GAS does however have the unfortunate downside of being somewhat unfriendly for people who want to use the Intel syntax.
FASM
This is the Flat Assembler, it is an assembler written in Assembly. Like NASM it's unfriendly to people who want to use AT&T syntax. It has a few rough edges but some people seem to prefer it for DOS applications (especially because there's a DOS port of it) and bare metal work.

Now you might be reading 'AT&T syntax' and 'Intel syntax' and wondering what's meant by that. These are dialects of x86 assembly, they both assemble to the same machine code but reflect slightly different ways of thinking about each instruction. AT&T syntax tends to be more verbose whereas Intel syntax tends to be more minimal, however certain parts of AT&T syntax have nicer operand orderings tahn Intel syntax, a good demonstration of the difference is the mov instruction:

AT&T syntax:

movl (0x10), %eax

This means get the long value (1 dword, aka 4 bytes) and put it in the register eax. Take note of the fact that:

The mov is suffixed with the operand length.
The memory address is surrounded in parenthesis (you can think of them like a pointer dereference in C)
The register is prefixed with %
The instruction moves the left operand into the right operand

Intel Syntax:

mov eax, [0x10]

Take note of the fact that:

We do not need to suffix the instruction with the operand size, the assembler infers it, there are situations where it can't, in which case we specify the size next to the address.
The register is not prefixed
Square brackets are used to address memory
The second operand is moved into the first operand

I will be using Intel syntax for this answer.
Once you've installed NASM on your machine you'll want a simple build script (when you start writing bigger programs use a Makefile or some other proper build system, but for now this will do):

nasm -f elf arrays.asm
ld -o arrays arrays.o -melf_i386
rm arrays.o
echo
echo " Done building, the file 'arrays' is your executable"

Remember to chmod +x the script or you won't be able to execute it. Now for the code along with some comments explaining what everything means:

global _start ; The linker will be looking for this entrypoint, so we need to make it public

section .data ; We're going on to describe our data here
    array_length equ 5 ; This is effectively a macro and isn't actually being stored in memory
    array1 dd 1,4,1,5,9 ; dd means declare dwords
    array2 dd 2,6,5,3,5

    sys_exit equ 1

section .bss ; Data that isn't initialised with any particular value
    array3 resd 5 ; Leave us 5 dword sized spaces

section .text
_start:
    xor  ecx,ecx     ; index = 0 to start
    ; In a Linux static executable, registers are initialized to 0 so you could leave this out if you're never going to link this as a dynamic executable.

    _multiply_loop:
        mov eax, [array1+ecx*4] ; move the value at the given memory address into eax
        ; We calculate the address we need by first taking ecx (which tells us which
        ; item we want) multiplying it by 4 (i.e: 4 bytes/1 dword) and then adding it
        ; to our array's start address to determine the address of the given item
        imul eax, dword [array2+ecx*4] ; This performs a 32-bit integer multiply
        mov dword [array3+ecx*4], eax ; Move our result to array3

        inc ecx ; Increment ecx
        ; While ecx is a general purpose register the convention is to use it for
        ; counting hence the 'c'
        cmp ecx, array_length ; Compare the value in ecx with our array_length
        jb _multiply_loop ; Restart the loop unless we've exceeded the array length

    ; If the loop has concluded the instruction pointer will continue
_exit:
    mov eax, sys_exit ; The system call we want
    ; ebx is already equal to 0, ebx contains the exit status
    mov ebp, esp ; Prepare the stack before jumping into the system
    sysenter ; Call the Linux kernel and tell it that our program has concluded

If you wanted the full 64-bit result of the 32-bit multiply, use one-operand mul. But normally you only want a result that's the same width as the inputs, in which case imul is most efficient and easiest to use. See links in the x86 tag wiki for docs and tutorials.

You'll notice that this program has no output. I'm not going to cover writing the algorithm to print numbers because we'd be here all day, that's an exercise for the reader (or see this Q&A)

However in the meantime we can run our program in gdbtui and inspect the data, use your build script to build then open your program with the command gdbtui arrays. You'll want to enter these commands:

layout asm
break _exit
run
print (int[5])array3

And GDB will display the results.

edited May 19 '19 at 20:14

Peter Cordes

328,167
45
605
847

answered May 18 '19 at 22:27

0x777C

993
7
21

This code overflows all the arrays! You're doing **2 iterations too many**. Use `cmp ecx, array_length` `jb _multiply_loop`. – Sep Roland May 18 '19 at 22:55
New error! It's not JumpOnGreater `jg` that you need. Use `jb`. You might also want to keep the comments in sync. – Sep Roland May 18 '19 at 23:04
@SepRoland What did I miss with the comments? – 0x777C May 18 '19 at 23:17
"Compare the value in ecx with our array_length+1" The +1 has to go. – Sep Roland May 18 '19 at 23:19
Would the same code run for MASM if I have visual studios on Windows? – Bunty May 18 '19 at 23:33
@Bunty No, MASM cannot assemble this exact code. I would not recommend using Windows to learn Assembly since the syscall ABI on Windows is unstable and in order to really do anything you'll need to make use of Microsoft's DLLs which involves some extra work. You may be able to get NASM running via WSL though. – 0x777C May 18 '19 at 23:39
I actually already have everything set up on MASM. What are the modifications that I would have to make in order for it to work on that. – Bunty May 19 '19 at 00:21
@Bunty I don't know exactly. The instructions should mostly be fine up until _exit but assembler directives are going to vary, you'll have to look at some MASM examples. Also the sys_exit system call is going to have to change to whatever Windows uses. – 0x777C May 19 '19 at 00:45
`nasm -felf` already implies `BITS 32`. You should *not* use `BITS 32` explicitly; that only makes it possible to accidentally assemble 32-bit code into a 64-bit object file, for example. Also, you don't need 32-bit code to access 32-bit registers; your code would assemble for 16-bit mode, just with `0x66` operand-size and `0x67` address-size prefixes in the machine code. It would only run on a 386-compatible, of course. – Peter Cordes May 19 '19 at 02:32
I'd also recommend not using `sysenter` manually. The Linux kernel's `sysenter` ABI is not 100% guaranteed stable the way `int 0x80` or 64-bit `syscall` are, so [the kernel wants user-space to only use it via the VDSO](https://github.com/torvalds/linux/blob/e7d0c41ecc2e372a81741a30894f556afec24315/arch/x86/entry/entry_64_compat.S#L240). It's also more complicated for any system calls that will return (unlike `sys_exit`), because it doesn't save EIP/RIP so user-space has to provide that. – Peter Cordes May 19 '19 at 02:36
1

There's no need to confuse a beginner with tricky one-operand `mul`. Use [`imul eax, dword [array2+ecx*4]`](https://www.felixcloutier.com/x86/imul) for all integer multiplies; it works just like `add` (except of course it can't have a memory destination). It has no implicit operands, and accepts an immediate; no weird stuff. It's also more efficient, so it's what you should use anyway unless you specifically *want* the high-half result, or you're optimizing for code-size over speed. One-operand mul and imul are tricky things that total beginners shouldn't worry about, like `loop` or `rep movs` – Peter Cordes May 19 '19 at 02:42
Also, it's probably better to explicitly `xor ecx,ecx` at the top of `_start`. That way it still works if dynamically linked, or copied to a `main` instead of `_start`. The ABI doesn't guarantee zeroing, that's just the Linux kernel's implementation choice; I'd only take advantage of it for code-golfing or for a quick and dirty thing I was writing for my own use e.g. to test performance of something. (In which case I wouldn't comment on it; again outside of code golf / extreme size optimization, if you feel the need to write a comment explaining it, an instruction would be better.) – Peter Cordes May 19 '19 at 03:07
@PeterCordes The code you're looking at is for taking care of 32-bit syscalls, not 32-bit sysenters, which if you look at entry_32.s has been used, also Golang has become more common since that code was written and it uses syscalls directly. Even if it is more complicated, `int 0x80` is not modern but rather extremely slow and deprecated. I'm against adding `xor ecx, ecx` because I don't think code should be written if there's no need for it, I'm only writing a comment because this is aimed at a beginner. The `chmod` was for the script, not the output binary. Also why is `imul` more efficient? – 0x777C May 19 '19 at 10:16
Sorry, linked the wrong line before. https://github.com/torvalds/linux/blob/e7d0c41ecc2e372a81741a30894f556afec24315/arch/x86/entry/entry_64_compat.S#L24 is for 32-bit `sysenter` into a 64-bit kernel. `entry_32.s` is for entry points into a *32-bit kernel*, not particularly relevant for modern systems. Yes I know `int 0x80` is significantly slower. It's also more beginner-friendly. If you care about performance, use a compiler unless you're already an asm expert and can't get a compiler to make as efficient code. I wasn't aware Go inlined syscalls, though; interesting. – Peter Cordes May 19 '19 at 13:41
`imul r32, r/m32` is 1 micro-fused uop on modern Intel/AMD CPUs, but `mul m32` is 3 on Skylake for example (including a micro-fused load+multiply uop): it needs extra 2 extra uops to split the 64-bit multiply result into two 64-bit halves, and to write the result to EDX. (`mul r/m64` is only 2 uops; the integer multiply hardware is 64 bits wide so that case doesn't need to split the low part of the result, just write the high half.) See https://agner.org/optimize/ for Agner Fog's instruction tables and optimization guide, especially his microarch guide to understand the tables. – Peter Cordes May 19 '19 at 13:44
@PeterCordes I've switched it to an `imul` purely out of concern for performance. As for being 'beginner friendly' I think that's a wider discussion that can be taken to chat. Also although Go inlines syscalls it does use `int 0x80` at the moment (on x86-64 it uses `syscall` as usual however so I don't think it has anything to do with any VDSO concerns), I've just submitted a bug report about it. – 0x777C May 19 '19 at 14:11
Ok good, I think the kernel devs don't want anyone inlining `sysenter`; The recommended way to make efficient 32-bit system calls is to call through the VDSO. (Partly I think this is to allow future changes to the `sysenter` dance between kernel and user-space. I was going to say that not all CPUs support `sysenter`, but not all CPUs support `cmov` or SSE2 either and we typically / sometimes inline those even in 32-bit code). Inlining `syscall` in 64-bit code is totally fine though, because it's baseline for x86-64 and has a simple ABI that's guaranteed not to change. – Peter Cordes May 19 '19 at 14:17

Multiplication of corresponding values in an array

1 Answers1