Sorting a string of characters in assembly

Question

I'm trying to write in assembly language a function sort. It will sort the two dimensional array such that the rows will now contain data in alphabetical order. I have tried many things but it is honestly beyond my current knowledge. here's what I tried so far...

.386
public _Sort
.model flat
.code
_Sort proc

    push ebp
    mov ebp, esp
    push esi
    push edi

    mov edi, [esp + 4]    ; address of destination array
    mov esi, [esp + 8]    ; address of source array
    mov ecx, [esp + 16]   ; # of elements to mov
    cld
    rep movsd
L1:
    mov eax, [esi]
    cmp [esi + 8], eax
    jg L2
    xchg eax, [esi + 8]
    mov [esi], eax
L2: 
    pop edi
    pop esi
    pop ebp

    ret     
_Sort endp
end

Here's the C++ code...

#include <iostream>

using namespace std;

extern "C" int Sort (char [] [20], int, int);

void main ()
    {
    char Strings [10] [20]
                    = { "One",
                        "Two",
                        "Three",
                        "Four",
                        "Five",
                        "Six",
                        "Seven",
                        "Eight",
                        "Nine",
                        "Ten"   };
    int i;
    cout << "Unsorted Strings are" << endl;
    for (i = 0; i < 10; i++)
        cout << '\t' << Strings [i] << endl;
    Sort (Strings, 10, 20);
    cout << "Sorted Strings are" << endl;
    for (i = 0; i < 10; i++)
        cout << '\t' << Strings [i] << endl;
    }

I do realize my assembly does not make sense, sorry about that.

Jerry Coffin · Accepted Answer · 2017-11-27T14:39:47.007

2

You'll want to build your assembly code in functions/procedures, just like you would code in some other language. Much like in C, string comparison, copying, etc., will need to be done in functions. Just for example:

; compares [esi] to [edi], returns +, 0 or - to indicate order
; inputs: esi, edi: addresses of strings
; destroys: esi, edi, edx
;
strcmp_int proc
    jmp short start
loop_top:
    inc esi
    inc edi
start:
    movsx eax, byte ptr [esi]
    movsx edx, byte ptr [edi]
    test edx, edx
    jz @f
    sub eax, edx
    jz loop_top
    ret
@@:
    sub eax, edx
    ret
strcmp_int endp

[warning: this code isn't necessarily intended to be used as-is--just an example of one of the kinds of functions you'll normally need to write to do this sort of job in assembly language. It's been long enough since I wrote much assembly language that you can undoubtedly do better on a modern processor--and at least for things done purely in assembly language, you normally want to put results in flags, rather than a -/0/+ in a register like strcmp (and this) produce. But note that this does return with flags set by the final sub]

See also Why is memcmp so much faster than a for loop check? for some links to optimized implementations (SSE2/AVX2) for explicit-length strings, which can be much faster for medium to long strings. For optimization links in general, see the x86 tag wiki.

Your sort will depend on the sorting algorithm you decide to implement. Obviously, a Quicksort won't look the same as an insertion sort. The main point, however, is simple: don't try to write it as a single, monolithic chunk of code—break it up into pieces that are individually easy to write and understand.

edited Nov 27 '17 at 14:39

answered Oct 26 '10 at 05:48

Jerry Coffin

476,176
80
629
1,111

`lodsb` already increments `esi`. Use `movzx eax, [esi]` for efficiency (`lodsb` is 3 uops on Intel CPUs, not worth it vs. `inc` / `movzx`). Also use `cmp al, [edi]` instead of sub, so it can macro-fuse with `jz` on more CPUs. – Peter Cordes Nov 26 '17 at 10:57
Also, this runs past the end of the strings if they're equal out to the terminating `0` >. – Peter Cordes Nov 26 '17 at 10:58
Obviously for efficiency you should compare dwords or qwords (or better use SSE2). With SSE2 it's easy to also check each byte for `0` in parallel, but for plain integer code you can use https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. (Wider loads get tricky with two implicit-length strings if they're misaligned relative to each other. [It's not as easy as just doing aligned loads like with `strlen`](https://stackoverflow.com/questions/37800739/is-it-safe-to-read-past-the-end-of-a-buffer-within-the-same-page-on-x86-and-x64).) – Peter Cordes Nov 26 '17 at 11:03
I as going to edit this, but it's not obvious how to make a non-bloated byte loop. IDK if it's practical to use `sub al, [edi]` to leave the return value in`al` for free. Maybe if we load `[edi]` into another register, so we can test it for end-of-string after `sub al, cl`. I found https://github.com/sebastiencs/asm_minilibc/blob/master/x86/strcmp.asm, but it's definitely not efficient. (`cmpsb` + an extra `cmp [mem], 0`) – Peter Cordes Nov 26 '17 at 11:18
[`loop` is *slow*](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently), but you don't seem to be optimizing for code-size over speed (you're using `movzx`/`inc` instead of `lodsb`). If you do want a compact explicit-length version, `repz cmpsb` is not great, but it's not horrible (and avoids branch mispredicts). After it's done, `movzx eax, [edi-1]` / `sub al, [esi-1]`. e.g. see https://stackoverflow.com/questions/21106801/why-is-memcmp-so-much-faster-than-a-for-loop-check – Peter Cordes Nov 27 '17 at 08:20
Here's an example like that using the subtract-afterwards: http://rayseyfarth.com/asm/src/ch08/memcmp.asm, but for some reason branches on equal / not-equal to skip the load/sub. I guess if that predicts well, it makes the return value not data-dependent on the inputs! – Peter Cordes Nov 27 '17 at 08:24
@PeterCordes: Yeah, I accidentally posted a half-rewritten version. I think what's in there now works and at least isn't necessarily horribly inefficient, though if somebody were ambitious, they could probably do better with AVX (or AVX2, AVX 512, etc.), at least assuming they could guarantee alignment and such. – Jerry Coffin Nov 27 '17 at 08:28
Yup, your last edit looks correct to me, and a good example given that caveat that it's not meant to be optimized. You *could* [put the function entry-point in the middle of the loop](https://stackoverflow.com/questions/35468465/does-a-function-with-instructions-before-the-entry-point-label-cause-problems-fo) to save the initial `jmp` :P But I don't think this question is a good place for an optimized strcmp, especially because the OP should just inline (or use a custom calling convention) so they can use a flag result instead of an actual -ve / 0 / +ve integer. – Peter Cordes Nov 27 '17 at 08:31
But we might as well include a link to high-performance stuff :) – Peter Cordes Nov 27 '17 at 08:39
1

@PeterCordes: Yup--if the mere mention of optimizing scares them off, they probably shouldn't be writing assembly language anyway. – Jerry Coffin Nov 27 '17 at 08:40
@PeterCordes: Oh, and getting rid of the initial `jmp` would actually be somewhat non-trivial--the `uses edx` means there's a hidden "push edx` at the entry, which we wouldn't want happening inside the loop. Of course, we could just eliminate that, and leave it to the caller to preserve edx (and for really optimal code, that's probably preferable). – Jerry Coffin Nov 27 '17 at 08:56
So `uses edx` saves/restores it regardless of typical calling conventions allowing you to clobber `edx` and `ecx`? – Peter Cordes Nov 27 '17 at 09:28
@PeterCordes: At least if memory serves, yes, it's supposed to (but I'd have to check to be really sure). – Jerry Coffin Nov 27 '17 at 14:26
Then I'd recommend making that a comment and letting it clobber `edx` :P – Peter Cordes Nov 27 '17 at 14:27
Totally agree about optimization being the main point of asm, BTW. But code-size can be fun, too sometimes. Speaking of sorting, I just code-golfed a [bubbly Selection Sort in 19 bytes of x86-64 machine code](https://codegolf.stackexchange.com/questions/77836/sort-an-integer-list/149038#149038), along with a x86-16 bubble-sort in 19 bytes. Not quite string sorting, but related :) – Peter Cordes Nov 27 '17 at 15:33
@PeterCordes: code size can actually be a lot of fun (but it's still optimization--just for size instead of speed). One we had a lot of fun with many years ago was a DOS program that would simply print out the number of times it had been run (so the first time you run it, it prints 1, the second time 2, and so on). No external storage allowed, of course. – Jerry Coffin Nov 27 '17 at 16:46
Yes, I meant to word my previous comment in a way that said something about optimizing for code-size, because that's how I see it, too. Editing failure :/ These days, code-size is usually just a nice-to-have when all else is equal, or worth favouring in non-hot functions. (Of course, it's not uncommon for all else to be equal even when going for all-out speed. e.g. use `vpunpcklqdq` instead of `vpshufd` to save the imm8, like in my [SSE horizontal sum answer](https://stackoverflow.com/a/35270026/224132), or how compilers use `movups` for stores instead of `movdqu`) – Peter Cordes Nov 27 '17 at 16:53
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/159914/discussion-between-jerry-coffin-and-peter-cordes). – Jerry Coffin Nov 27 '17 at 16:57

Sorting a string of characters in assembly

1 Answers1