29

Some CPUs (notably x86 CPUs) feature a parity flag on their status register. This flag indicates whether the number of bits of the result of an operation is odd or even.

What actual practical purpose does the parity flag serve in a programming context?

Side note: I'm presuming it's intended to be used in conjunction with a parity bit in order to perform basic error checking, but such a task seems to uncommon to warrant an entire CPU flag.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Pharap
  • 3,826
  • 5
  • 37
  • 51
  • 1
    1970s hardware, like paper tape punches and serial ports, those old bits fell over much easier :) Thumbwheels and nixie tubes begat the BCD instructions, like AAA. – Hans Passant Sep 07 '14 at 06:54
  • @HansPassant BCD I understand keeping, 7-segs and nixies are still used by hobbyists (and maybe cheapskates or dot-matrix hating madmen). – Pharap Sep 07 '14 at 07:02
  • 2
    Bad news for the hobbyists I'm afraid, they were actually dropped in x64 to make room for 64-bit instructions. – Hans Passant Sep 07 '14 at 07:19
  • 1
    @HansPassant So much for 'backwards compatibility'. – Pharap Sep 07 '14 at 07:22

6 Answers6

31

Back in the "old days" when performance was always a concern, it made more sense. It was used in communication to verify integrity (do error checking) and a substantial portion of communication was serial, which makes more use of parity than parallel communications. In any case, it was trivial for the CPU to compute it using just 8 XOR gates, but otherwise was rather hard to compute without CPU support. Without hardware support it took an actual loop (possibly unrolled) or a lookup table, both of which were very time consuming, so the benefits outweighed the costs. Now though, it is more like a vestige.

Pharap
  • 3,826
  • 5
  • 37
  • 51
Dwayne Towell
  • 8,154
  • 4
  • 36
  • 49
  • 4
    I think you should add the reason why Intel x86 (and clones) processors today still have it - Intel has tried to retain backwards compatibility (as much a possible) with each earlier chip. So since the root of the line comes from the 8086 in 1978 each generation of chip retained that functionality across 35 years. – Michael Petch Sep 07 '14 at 05:16
  • 4
    The most weird thing here is that initially Intel designed to provide this flag after _each_ operation which modifies condition codes. A less strange approach is to provide a separate instruction which checks parity and modifies a selected CC flag (e.g. CF) but 8080 wasn't designed this way, and now even 64-bit operations set parity in the same manner. This combination of one-step forecasting and keeping old legacy is the most horrible Intel feature. – Netch Sep 07 '14 at 05:19
  • @Netch Such is the price of backwards compatibility. Personally I'm ok with tearing something up and replacing it if the new method is superior, but I think it's probably a little late in the game to change this one. – Pharap Sep 07 '14 at 06:04
  • @Pharap there is NO such price. Both change to 32-bit ISA and 64-bit one could be place to remove all bits of weird old crap. The reason it wasn't ever done is brain damaged policy of "compatibility" when it can be easily ignored without any real problem. – Netch Sep 07 '14 at 06:56
  • 4
    32-bit support didn't really change the ISA so much as extend it to 32-bits. It would've meant special casing 32-bit operations to not generate the parity bit, for no actual gain. The hardware for parity generation would still be there, there would just now be additional hardware to conditionally not generate it. And remember back in 1985 when the 80386 was introduced parity and the cost of calculating it was still relevant. As for the 64-bit ISA, you still have the issue of additional silicon space being used to disable a feature that still needs be implemented for backwards compatibility. – Ross Ridge Sep 07 '14 at 15:45
  • 3
    And if you want an ISA that didn't follow the "brain damaged policy of 'compatibility'" Intel called it Itanium. – Ross Ridge Sep 07 '14 at 15:49
  • @RossRidge when I want such ISA, I don't search it at Intel;( – Netch Sep 08 '14 at 06:49
  • tl:dr; there is no use of this flag for a programmer, except for code obfuscation? – sivizius Sep 14 '17 at 16:52
  • @sivizius yes. It's used for swapping bits or [comparison in x87](https://stackoverflow.com/questions/25707130/what-is-the-purpose-of-the-parity-flag-on-a-cpu/25707223#comment79463092_43433515) – phuclv Sep 27 '20 at 00:56
17

The Parity Flag is a relic from the old days to do parity checking in software.

TL;DR

What is parity

As Randall Hyde put it in The Art of Assembly Language, 2nd Edition:

Parity is a very simple error-detection scheme originally employed by telegraphs and other serial communication protocols. The idea was to count number of set bits in a character and include an extra bit in the transmission to indicate whether that character contained an even or odd number of set bits. The receiving end of the transmission would also count the bits and verify that the extra "parity" bit indicated a successful transmission.

Why Parity Flag was added to CPU architecture

In the old days there was serial communication hardware (UART) that lacked the ability to do parity checking on transmitted data, so programmers had to do it in software. Also some really old devices like paper tape punches and readers, used 7 data bits and a parity bit, and programmers had to do the parity checking in software to verify data integrity. In order to be able to use parity bit for error detection communicating parties would have to agree in advance on whether every transmitted byte should have odd or even parity (part of a communication protocol).

The primary methods to do parity checking in software without CPU support are bit counting or using a lookup table. Both are very expensive compared to having a Parity Flag in a CPU computed by a single instruction. For that reason in April 1972 Intel introduced the Parity Flag into their 8008 8-bit CPU. Here is an example of how each byte could be tested for integrity on the receiving end since then.

mov        al,<byte to be tested>
test       al,al
jp         <somewhere>         ; byte has even parity
                               ; byte has odd parity 

Then a program could perform all sorts of conditional logic based on the value of the Parity Flag.

Evolution of conditional parity instructions in Intel CPUs

  • 1972 - the Parity Flag is first introduced with Intel 8008. There are conditional instructions for jumps (JPO, JPE), calls (CPO, CPE) and returns (RPO, RPE).
  • 1978 - Intel 8086 drops everything except for conditional jumps (JNP/JPO, JP/JPE).
  • 1985 - Conditional set instructions SETPE/SETP and SETPO/SETNP are added with Intel 80386.
  • 1995 - Conditional move instructions CMOVP/CMOVPE, CMOVNP/CMOVPO are added with Pentium Pro.

This set of instructions which make use of the Parity Flag remained fixed since then.

Nowadays the primary purpose of this flag has been taken over by hardware. To quote Randall Hyde in The Art of Assembly Language, 2nd Edition:

Serial communications chips and other communications hardware that use parity for error checking normally compute the parity in hardware; you don't have to use software for this purpose.

The antiquity of the Parity Flag is proved by the fact that it works on low 8 bits only, so it's of limited use. According to Intel® 64 and IA-32 Architectures Software Developer Manuals the Parity Flag is:

Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise.

Interesting fact: By his own words a networking engineer Wolfgang Kern scanned all code he had written at some point (~14 GB) for JPE and JPO instructions and found it only in an RS232 driver module and in an very old 8-bit calculation.

Sources

Alex Yursha
  • 3,208
  • 3
  • 26
  • 25
  • 1
    *80386 instead of 80383 – sivizius Sep 14 '17 at 16:49
  • 2
    @sivizius: these days the main use for x86's PF is in floating-point code, because FP compares set PF when the result is unordered. (One or both operands are NaN). This is for historical reasons x87: `fucom st1` `FNSTSW AX` / `sahf` ends up putting `c2` from the FP status word into PF, and later instructions like `fucomi` and SSE `ucomiss` put the compare result into integer flags directly with the same mapping. (See http://www.ray.masmcode.com/tutorial/fpuchap7.htm). Here's a real example of x86-64 code-gen by gcc7.2 using `JP`: https://godbolt.org/g/hHRCzv – Peter Cordes Sep 16 '17 at 02:09
  • Obviously in this case it's not actually a parity bit like the question is asking about, but you were commenting on the other answer. (See also [the `fucomi`](http://felixcloutier.com/x86/FCOMI:FCOMIP:%20FUCOMI:FUCOMIP.html) instruction-set manual entry for a table of how it sets flags. – Peter Cordes Sep 16 '17 at 02:11
  • 1
    @AlexYursha: to get the parity of a 64-bit integer using PF for the low 8: `x ^= (x>>32);` `x^=(x>>16);` `xor al, ah`, then PF is set according to the parity of the whole thing. PF saves you another three steps of shift/xor. (And without BMI2 `rorx` to copy+shift, `x^= x>>16` takes a MOV / SHR / XOR.) – Peter Cordes Sep 16 '17 at 02:18
  • It's better with AVX, where you can `vpsrlq ymm1, ymm0, 32` / `vpxor ymm0, ymm0, ymm1`, so only 2 instructions to narrow by two inside each 64-bit element. To get the parity of a very long bitstring, of course you `vpxor` in 256b chunks until you have one vector at the end to horizontal xor. (And then you'd use shuffle/vpxor to get down to one 8-bit element, starting like you would for a horizontal sum: https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86). Anyway, PF just saves 3 steps at the end, and is useful in any rare case you want parity. – Peter Cordes Sep 16 '17 at 02:20
  • @PeterCordes There is no `jp` in the code at this link you posted. – sivizius Sep 18 '17 at 14:38
  • @sivizius: in https://godbolt.org/g/hHRCzv? Look at line 8 of the asm output: `JP .L7`. gcc does a very poor job of CSEing, sorry I could have picked a better example with fewer other branches. This (https://godbolt.org/g/sY5cbo) is simpler: jump if Unordered (`jp`), then jump if not equal (`jne`). – Peter Cordes Sep 18 '17 at 14:44
  • NoScript prevented it to load because it does not like potential JS code in the url, but know I see it, thx. – sivizius Sep 18 '17 at 14:51
  • 1
    @Alex: oops, I just realized that PF isn't part of the optimal solution for parity of wider registers on CPUs with `popcnt`. As [Cody Gray points out](https://stackoverflow.com/a/43929095/224132), `popcnt rax, rax` / `and eax, 1` gives you the parity of a 64-bit register. No need to narrow down to 8 bit and `setp`. – Peter Cordes Sep 19 '17 at 06:01
  • 1
    @Peter Cordes: Your link on `pushf` is wrong. It is an 8086-level instruction, not 186. This error was apparently fixed some time between your link's revision and the one which I extracted from NASM 2.05: https://ulukai.org/ecm/doc/insref.htm#insPUSHF – ecm Apr 01 '21 at 19:27
  • 1
    @ecm: ok that makes sense, `pushf` is a pretty essential instruction (the only way to get at some of the bits in FLAGS except for making an exception push them) so it would have been really weird for 8086 not to have it. – Peter Cordes Apr 01 '21 at 19:32
  • 1
    The useful part of that earlier comment was: `lahf` loads AH from the low byte of FLAGS ([including PF as bit #2](https://en.wikipedia.org/wiki/FLAGS_register)), so you can build a `setp` out of that (with some shift/AND) without a branch on CPUs before 386. Although it's probably faster on 8086 to just branch, especially without 186 `shr ah, 2`, unless you're ok with `lahf` / `and ah, 1<<2` to get a 0 or 4 instead of 0 or 1. And of course `pushf` can push the whole (E)FLAGS. So you can read PF in ways other than `jp` / `jnp` even on 8086. – Peter Cordes Apr 01 '21 at 19:35
5

There's one practical micro-optimization achievable with parity -- that's bit swapping as used eg in fourier transform address generation using the butterfly kernel.

To swap bits 7 and 0, one can exploit parity of (a&0x81) followed by conditional (a^=0x81). Repeat for bits 6/1, 5/2 and 4/3.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
3

I was one of the original writers for Computer World New Zealand in the 1980's, and earlier had been an MIS Manager for a major utility, which used Datapoint computers. It was three men at Datapoint (formerly Computer Terminal Corporation), a Texan company based in San Antonio, who designed the 8008 in 1967: Victor Poor, Harry Pyle and Jonathan Schmidt. They asked Texas Instruments to make it, and it did, then decided not to continue with the building of microprocessors (not a great decision!). They then gave that task to Intel. In 1971 Datapoint introduced an 8K desktop machine wrapped round that chip, which by then had became the 8080 iirc, and in 1977 they invented the LAN, which they called the ARC (Attached Resource Computer), so they gave the world two very important advances. In 1967 Victor Poor had had the idea of running diskless machines over high-speed intercity links to central repositories of data (that would never catch on!). Hence 'Datapoint' analogous to power-point. It fell to Jonathan Schmidt, then a teenager, to write the code, and he told me he could not write code that would run fast enough to check parity, so he put a parity flag into the chip. When I interviewed him, the 80286 was the level reached, and he said to me: 'Funny thing, that flag's still in the chip.' Datapoint had continued developing the chip till the 286, then handed development to Intel. So that is why there's a parity flag: teenager Jonathan Schmidt could not write code that would run fast enough to check parity on the first cut of that chip when it was being designed in Victor Poor's house in 1967.

2

Personally, I think that rumours of the parity flag's death have been greatly exaggerated. It can be extremely useful in certain circumstances. Consider the following assembler language procedure:

push       rbp
mov        rbp, rsp
xor        eax, eax
ucomisd    xmm0, xmm1
setnp      al
pop        rbp
ret

This takes two double-precision arguments in xmm0, xmm1, and returns a boolean result. See if you can figure out what it's doing.

Dave Jewell
  • 190
  • 1
  • 7
  • 3
    Indeed, that's the main use of PF in modern code, but it's not actually parity of anything. (Spoiler alert for what it does: [comments on a previous answer](https://stackoverflow.com/questions/25707130/what-is-the-purpose-of-the-parity-flag-on-a-cpu/66907854#comment79463092_43433515) mention this use-case.) – Peter Cordes Apr 01 '21 at 16:44
1

Sometimes you just need to wait until somebody comes up with a clever use case for abandoned features: Efficient n-states on x86 systems

With just one test, it's possible to check a variable for 4 states like this:

; flag = {-1, 0, 1, 3} 
testl $-1,flag     ; and flag with -1 and set condition codes
jz    case_1       ; jump if flag == 0
js    case_2       ; jump if flag < 0
jp    case_3       ; jump if flag > 0 and flag has even parity
; flag > 0 and has odd parity

Furthermore, it's possible to make branches in a fashion JNNJNJJNNJN… or NJJNJNNJJNJ… (J=Jump; N=No Jump)

Things I can think of are:

  • JNNJ is the inverse of NJJN
  • JNJ, NJN, JNNJ and NJJN are also palindromes
  • You could loop only once in a set of 3, or twice in a set of 3, then repeat. This is actually used in an assembly demo somewhere. Just find the right starting point for the exact pattern you need.
Olorin
  • 123
  • 5
  • 3
    `cmpl $0, flag` would [set all FLAGS the same way](https://stackoverflow.com/questions/147173/testl-eax-against-eax) (according to the 32-bit value of `flag`), but be a shorter instruction. (8-bit immediate instead of 32; `test` doesn't have a sign-extended imm8 form.) (The comment section on Andi Kleen's blog doesn't seem to be working, or at least wouldn't load for me, even in an incognito browser window with no extensions, so I couldn't comment there as well.) – Peter Cordes Jan 17 '23 at 14:23
  • The TEST is against -1 (0ffffh), not itself like in your earlier commentary. But I'm having a lazy day and just brute forced it, just to prove that you're actually also right in this case : ; AX=BX=CX=0 on start start: test cx,-1 pushf pop ax cmp cx,0 pushf pop bx cmp ax,bx loope start je match nomatch: mov ah,2 mov dl,'N' int 21h match: mov ah,2 mov dl,'O' int 21h mov dl,'K' int 21h int 20h Output is "OK" in DOSBOX-X – Olorin Jan 19 '23 at 13:45
  • `x & x == x`, exactly the same result as `x & -1 == x`, for all `x`. The whole point is to set FLAGS according to `x` (the register or memory value), like comparing against `0`. TEST's flag-setting depends only on the result, not either source operand, so `test same,same` and `test $-1,x` are of course the same. Thus reducing it to the linked case. – Peter Cordes Jan 19 '23 at 15:59
  • Yes but I wanted to rule out unforeseen behaviour. In any case I wasn't aware of the equality. It's an interesting find in any case to shave off superfluous bytes :-) – Olorin Jan 20 '23 at 14:46
  • You didn't realize that `x & -1 == x`? That's why `test` works at all for uses like this, setting FLAGS according to a value. Both follow pretty obviously if you think about how AND masking actually works; or just work through the 2 entries of the truth table (not 4 because it's either the same input twice or one input fixed). – Peter Cordes Jan 20 '23 at 14:56
  • No I meant I didn't realize that cmp x,0 = test x,x And you never know about side effects on a real CPU. – Olorin Jan 22 '23 at 17:07
  • 1
    Ok sure, that's less obvious, but that's why I commented. As for side-effects on real CPUs, you mean performance? Yeah, that can vary by microarchitecture, like whether it's better to load into a temporary register since `cmp $imm, flag(%rip)` can't micro or macro-fuse on Intel CPUs. If you mean architectural effects (correctness), it's fully documented so you *do* in fact know that it's guaranteed to work on all past and future x86 CPUs. – Peter Cordes Jan 22 '23 at 17:17