so why cant people just use movl to do that?
code-size, and ADD modifies flags. (Although you can avoid that by using LEA for the pointer increment).
One of the major reasons for the existence of most complex single-byte instructions is that 8086 was almost completely bottlenecked on code-fetch. Besides the fact that memory was precious in general, code size ~= code speed on the first generation of x86 CPUs. That's definitely not the case on modern CPUs, with fast instruction caches and power-hungry decoders, and even caches for decoded instructions.
Having one-byte instructions for exchange-register-with-AX is a huge waste of 8 precious opcodes for modern x86, but was apparently useful for 8086 since MOVSX didn't exist until 386 (so you needed CBW), and other stuff required AX. (And XCHG wasn't 3x worse throughput than MOV like it is now). Fun fact: 0x90 NOP comes from this encoding of xchg eax, eax
.
are that any speed improvements
Yes, code-size always matters.
Also, on Intel P6-family and Sandybridge-family, LODSD (aka lodsl
in at&t syntax) is 3 uops until Haswell. On Haswell, LODSD/Q is only 2 uops. (LODSB/W is still 3 uops). See Agner Fog's instruction tables and microarch pdf, and other links in the x86 tag wiki, like Intel's optimization manual.
So until Haswell, it's probably best to use separate MOV and ADD instructions unless code-size is really important (e.g. in a bootloader, where speed is nearly irrelevant).