There is a recent publication at nature.com, Faster sorting algorithms discovered using deep reinforcement learning, where it talks about AlphaDev discovering a faster sorting algorithm. This caught my interest, and I've been trying to understand the discovery.
Other articles about the topic are:
- AlphaDev discovers faster sorting algorithms
- Understanding DeepMind's AlphaDev Breakthrough in Optimizing Sorting Algorithms
Here is the pseudo-code of the original sort3 algorithm against the improved algorithm that AlphaDev discovered.
Original Pseudo-Code
Memory [0] = A
Memory [1] = B
Memory [2] = C
mov Memory[0] P // P = A
mov Memory[1] Q // Q = B
mov Memory[2] R // R = C
mov R S
cmp P R
cmovg P R // R = max(A, C)
cmovl P S // S = min(A, C)
mov S P // P = min(A, C)
cmp S Q
cmovg Q P // P = min(A, B, C)
cmovg S Q // Q = max(min(A, C), B)
mov P Memory[0] // = min(A, B, C)
mov Q Memory[1] // = max(min(A, C), B)
mov R Memory[2] // = max(A, C)
AlphaDev Pseudo-Code
Memory [0] = A
Memory [1] = B
Memory [2] = C
mov Memory[0] P // P = A
mov Memory[1] Q // Q = B
mov Memory[2] R // R = C
mov R S
cmp P R
cmovg P R // R = max(A, C)
cmovl P S // S = min(A, C)
cmp S Q
cmovg Q P // P = min(A, B)
cmovg S Q // Q = max(min(A, C), B)
mov P Memory[0] // = min(A, B)
mov Q Memory[1] // = max(min(A, C), B)
mov R Memory[2] // = max(A, C)
The improvement centers around the omission of the single move command, mov S P
. To help understand, I wrote the following assembly code. However, my testing shows that the sorting algorithm does not work when A=3, B=2, and C=1, but it does work when A=3, B=1, and C=2.
This is written, compiled, and run on Ubuntu 20.04 Desktop.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
$ nasm -v
NASM version 2.14.02
$ ld -v
GNU ld (GNU Binutils for Ubuntu) 2.34
My assembly code test...
; -----------------------------------------------------------------
;
; sort_test.asm
;
; Test for AlphaDev sorting algorithm
;
; My findings are that AlphaDev's removal of 'mov S P' doesn't work when:
; a = 3, b = 2, c = 1
; But it does work with:
; a = 3, b = 1, c = 2
;
; Output: The sorted values of a, b & c printed to stdout with spaces
;
; Compile & run with:
;
; nasm -f elf32 sort_test.asm && ld -m elf_i386 sort_test.o -o sort_test && ./sort_test
;
; -----------------------------------------------------------------
global _start
section .data
a equ 3
b equ 2
c equ 1
section .bss
buffer resb 5
section .text
_start:
; ------------------- ; AlphaDev pseudo-code
mov eax, a ; P = A
mov ecx, b ; Q = B
mov edx, c ; R = C
mov ebx, edx ; mov R S
cmp eax, edx ; cmp P R
cmovg edx, eax ; cmovg P R // R = max(A, C)
cmovl ebx, eax ; cmovl P S // S = min(A, C)
; The following line was in original sorting algorithm,
; but AlphaDev determined it wasn't necessary
; mov eax, ebx ; mov S P // p = min(A, C)
cmp ebx, ecx ; cmp S Q
cmovg eax, ecx ; cmovg Q P // P = min(A, B)
cmovg ecx, ebx ; cmovg S Q // Q = max(min(A, C), B)
; add values to buffer with spaces
add eax, 30h
mov [buffer], eax
mov [buffer+1], byte 0x20
add ecx, 30h
mov [buffer+2], ecx
mov [buffer+3], byte 0x20
add edx, 30h
mov [buffer+4], edx
; write buffer to stdout
mov eax, 4 ; sys_write system call
mov ebx, 1 ; stdout file descriptor
mov ecx, buffer ; buffer to write
mov edx, 5 ; number of bytes to write
int 0x80
mov eax, 1 ; sys_exit system call
mov ebx, 0 ; exit status 0
int 0x80
I've run this test on the command line to print the results of the sort, but I also used gdb
to step through this executable line-by-line. During this debugging, I clearly see that the register for "A", aka "P", aka "eax", is never updated when A=3, B=2, and C=1 but is updated when A=3, B=1, and C=2.
Full disclosure... I'm not an assembly programmer. I'm also not proficient in any other specific language, but I've experimented with C, C++, Javascript, PHP, & HTML to get small projects done. Basically, I'm self taught on what I do know. To get to the point to write this test, I've had to learn quite a bit. Therefore, I could certainly be making mistakes or not understanding the problem.
Anyway, please help me understand why I'm observing what I am.
- Am I misunderstanding the problem?
- Am I misunderstanding the pseudo-code?
- Am I making a mistake transforming the pseudo-code into assembly?
- Is there a mistake with my assembly code?
- Is the pseudo-code wrong?