The two obvious answers are bisection (semi-slow) and Newton-Raphson/Leibniz iteration (usually faster). To keep from spoiling anybody's fun, I'll do a reinterpret_cast on the question -- here's an implementation of an integer square root in 8086 assembly language using the Newton-Raphson technique:
isqrt proc uses di, number:word
;
; uses bx, cx, dx
;
mov di,number
mov ax,255
start_loop:
mov bx,ax
xor dx,dx
mov ax,di
div bx
add ax,bx
shr ax,1
mov cx,ax
sub cx,bx
cmp cx,2
ja start_loop
ret
isqrt endp
This is open to some improvement -- it uses x/2 as its initial guess at the sqrt(x). With 386+ instructions, you can use bsr
to find the most significant bit that's set to get a rough approximation of log2x, and divide that by 2 to get your initial approximation.
OTOH, this really only made sense on ancient processors. For anything since the 486 (or so) that has built-in floating point hardware, it's nearly certain that the FSQRT
instruction will beat this (or pretty much anything else you can write).