Note: this answer talks about the x86 32-bit architecture (80386DX and higher). While the 16-bit architecture (8086 - 80286) is similar, it is inherently different nevertheless. Read the Intel 64 and IA-32 Architectures Software Developer's Manual for further information.
Furthermore, I'm using Intel syntax here. If AT&T syntax, as used in your question, is more familiar to you, tell me and I'll adjust my answer accordingly.
x86 processors have a certain set of registers.
From vol. 1, §3.4:
- General-purpose registers. These eight registers are available for storing operands and pointers.
- Segment registers. These registers hold up to six segment selectors.
EFLAGS
(program status and control) register. The EFLAGS
register report on the status of the program being executed and allows
limited (application-program level) control of the processor.
EIP
(instruction pointer) register. The EIP register contains a 32-bit pointer to the next instruction to be executed.
(Code formatters added.)
Segment registers contain a segment selector, which points to a segment descriptor in the Global Descriptor Table, which in turn describes a segment of linear (a.k.a. virtual)1 memory. It's complex and lengthy, so I won't delve into great detail about it here. Read the manual if you want to know more.
The colon (:
) here is just a notation for the segment-offset combination.
Moreover, you don't need to fret about segmentation in a user program because it is completely handled by the OS and the value usually stays the same throughout the program's runtime.
Now that you roughly know what segment registers are, I'll explain the instruction itsself.
From vol. 1, §7.3.9.1:
[...]
The string elements to be operated on are identified with the ESI
(source string element) and EDI
(destination string element)
registers. Both of these registers contain absolute addresses (offsets
into a segment) that point to a string element.
By default, the ESI
register addresses the segment identified with the DS
segment
register. A segment-override prefix allows the ESI
register to be
associated with the CS
, SS
, ES
, FS
, or GS
segment register. The EDI
register addresses the segment identified with the ES
segment
register; no segment override is allowed for the EDI
register. The use
of two different segment registers in the string instructions permits
operations to be performed on strings located in different segments.
[...]
The CMPS
instruction
subtracts the destination string element from the
source string element and updates the status flags (CF, ZF, OF, SF,
PF, and AF) in the EFLAGS
register according to the results. Neither
string element is written back to memory. The assembler recognizes
three “short forms” of the CMPS
instruction: CMPSB
(compare byte
strings), CMPSW
(compare word strings), and CMPSD
(compare doubleword
strings).
(Code formatters added.)
Long story short: CMPS
performs a CMP
with DS:ESI
and ES:EDI
as operands. Interesting to note is that CMP
alone cannot compare two memory operands. However, CMPS
can.
Some instructions assume registers implicitly. The string instructions fall into that category. They automatically work on ESI
and EDI
and only a segment override prefix is allowed (so it's not DS:ESI
but FS:ESI
, for instance). Another example for implicit operands is SHR
. SHR AX
will shift AX
one bit to the right. However, in that case, it's rationale lies in history: the first x86 CPUs knew only shifting one bit or CL
bits. Immediate operands were introduced later, so SHR AX
would be used back then to shift one bit, equivalently to SHR AX, 1
.
But why does the assembler (presumably GNU as) print the source and destination operands anyway? Good question, I can't tell for sure either. Maybe to display possible segment override prefixes.
Let's talk about the prefix REPZ
now.
From vol. 1, §7.3.9.2:
The following repeat prefixes can be used in conjunction with a count
in the ECX register to cause a string instruction [hyphen removed] to repeat:
REP
— Repeat while the ECX
register not zero.
REPE
/REPZ
— Repeat while the ECX
register not zero and the ZF flag
is set.
REPNE
/REPNZ
— Repeat while the ECX
register not zero and the ZF flag
is clear.
(Code formatters added.)
So, REPZ CMPSB
instruction repeats CMPSB
as long as ECX
is not zero and ZF (Zero Flag) is set.
From vol. 1, §3.4.3.1:
Zero flag — Set if the result is zero; cleared otherwise.
From this and because a result being zero indicates equality, we can deduce that REPZ CMPSB
runs as long as BYTE PTR [DS:ESI]
equals BYTE PTR [ES:EDI]
, ECX
times. This means that when the instruction has finished, it either points to the first unequal BYTE PTR [DS:ESI]
-BYTE PTR [ES:EDI]
pair or to the bytes after the last ones in the string of bytes (in case ECX
has reached zero).
To be continued with SETA
and SETB
instructions soon.
All quotations refer to the Intel 64 and IA-32 Architectures Software Developer's Manual.
1 For the difference between physical, logical, and virtual addresses, see here.