Imagining two types of boolean pairs within the same language without further ado

Question

I have just a little theoretical curiosity. The == operator in C returns 1 in case of positive equality, 0 otherwise. My knowledge of assembly is very limited. However I was wondering if it could be possible, theoretically, to implement a new operator that returns ~0 in case of positive equality, 0 otherwise – but at one condition: it must produce the same number of assembly instructions as the == operator. It's really just a theoretical curiosity, I have no practical uses in mind.

EDIT

My question targets x86 CPUs, however I am very curious to know if there are architectures that natively do that.

SECOND EDIT

As Sneftel has pointed out, nothing similar to the SETcc instructions [1] – but able to convert flag register bits into 0/~0 values (instead of the classical 0/1) – exists. So the answer to my question seems to be no.

THIRD EDIT

A little note. I am not trying to represent a logical true as ~0, I am trying to understand if a logical true can also be optionally represented as ~0 when needed, whithout further effort, within a language that already normally represents true as 1. And for this I had hypothized a new operator that “returns” numbers, not booleans (the natural logical true “returned” by == remains represented as 1) – otherwise I would have asked whether == could be re-designed to “return” ~0 instead of 1. You can think of this new operator as half-belonging to the family of bitwise operators, which “return” numbers, not booleans (and by booleans I don't mean boolean data types, I mean anything outside of the number pair 0/1, which is what a boolean is intended in C as a result of a logical operation).

I know that all of this might sound futile, but I had warned: it is a theoretical question.

However here my question seems to be addressed explicitly:

Some languages represent a logical one as an integer with all bits set. This representation can be obtained by choosing the logically opposite condition for the SETcc instruction, then decrementing the result. For example, to test for overflow, use the SETNO instruction, then decrement the result.

So it seems there is no direct instruction, since using SETNE and then decrementing means adding one more instruction.

If you design an ISA that supports such an operation, and a compiler that makes use of that operation, then sure. — Christian Gibbons, Apr 09 '19 at 15:52
VBA pretty much does this (when `=` is used for comparison). That evaluates to 'True' which is -1. — Bathsheba, Apr 09 '19 at 15:53
Whether that's possible will probably differ depending on the specific architecture you're targetting. You haven't specified that. — Michael, Apr 09 '19 at 15:53
A theoretical architecture can have a single assembly instruction running Windows. — Eugene Sh., Apr 09 '19 at 15:55
@EugeneSh. I imagine such an architecture to be designed by Microhard. — Christian Gibbons, Apr 09 '19 at 15:56
@ChristianGibbons - that should be macrohard, the same company referenced in the Lucas Arts game Sam & Max Hit The Road. — rcgldr, Apr 09 '19 at 15:57
@Bathsheba I am talking about the assembly implementation of a theoretical new operator, not about VBA. — madmurphy, Apr 09 '19 at 15:59
Your edit makes a little sense without context. The assembly code generated out of the `==` operator is very dependent on what else it is doing. It might jump somewhere, it might assign something or call a function. Or even ignored and optimized out. — Eugene Sh., Apr 09 '19 at 16:00
@rcgldr I only wanted to play on changing `soft` to `hard`, since it's still Windows, but in Hardware rather than Software. But Macrohard is certainly a good choice for a parody name. Plus I always appreciate the old-school point-and-clicks. It just brought to mind Space Quest's "Droids B Us". I can't remember if they got in legal trouble with that one since Lucasfilm owned the rights to the term "Droid". — Christian Gibbons, Apr 09 '19 at 16:00
Tag an architecture. On most architectures the equality operator used for branching produces different instructions that the one used as an expression. In the latter case the number of instructions depends on the compiler so your question is turning into: "Can we express both operators with the same number of instructions?" Yes, of course, eventually you lengthen the shortest one. — Margaret Bloom, Apr 09 '19 at 16:01
@EugeneSh. I know. But imagine the simplest case, imagine a `return var_a == var_b;` (not optimized, of course). — madmurphy, Apr 09 '19 at 16:02
C operators aside, comparisons in SSE (arguably part of x86) do result in 0 or ~0 depending on their outcome. — harold, Apr 09 '19 at 16:20
x86 has a marginally documented `salc` instruction that does more or less this. — fuz, Apr 09 '19 at 16:42
Sure, you could add a new operator to the language and define its semantics, but that's *independent of any implementation*. The language definition only specifies what the result should be, not how it's to be computed. It's up to the compiler implementation to map behavior onto specific machine instructions. And, thanks to the "as if" rule, if the compiler can generate the result using something other than the instructions you expect, it's free to do so. — John Bode, Apr 09 '19 at 17:20
I know, and I agree, John. But in my idea of creating a language I would never add features that don't have a facility in at least some architectures. That's just my personal opinion. The spectrum of high level languages is already quite well covered, and for low-level programming there is C (if not assembly). But I do think that C needs to be rewritten (especially its poor ability to evaluate constant expressions during compile time). I will probably not be the one in charge of such a task, but I am still curious about how this could be done! — madmurphy, Apr 09 '19 at 17:30

Seva Alekseyev · Answer 1 · 2019-04-09T16:47:41.750

3

EDIT: as other people are pointing out, there are some flavors of "conditionally assign 0/1" out there. Kind of undermines my point :) Apparently, the 0/1 boolean type admits a slightly deeper optimization than a 0/~0 boolean.

The "operator returns a value" notion is a high level one, it's not preserved down to the assembly level. That 1/0 may only exist as a bit in the flags register, or not even that.

In other words, assigning the C-defined value of the equality operator to an int sized variable is not a primitive on the assembly level. If you write x = (a == b), the compiler might implement it as

cmp a, b ; set the Z flag
cmovz x, 1 ; if equals, assign 1
cmovnz x, 0 ; if not equals, assign 0

Or it can be done with conditional jumps. As you can see, assigning a ~0 as the value for TRUE will take the same commands, just with a different operand.

None of the architectures that I'm familiar with implement equality comparison as "assign a 1 or 0 to a general purpose register".

edited Apr 09 '19 at 16:47

answered Apr 09 '19 at 16:02

Seva Alekseyev

59,826
25
160
281

Thanks, Seva. That answers my question, I guess! – madmurphy Apr 09 '19 at 16:06
2

I believe MIPS's `slt` (set on less than) assigns 1 or 0 to a general purpose register. – Christian Gibbons Apr 09 '19 at 16:13
x86 `cmov` only works with register or memory source operands, not immediate. But your code works as pseudocode if you have values in registers. Except you'd actually `xor ecx, ecx` to zero it, then `cmp eax,ebx` / `cmove ecx, edi` (where EDI holds `-1`). You wouldn't use a chain of 2 `cmov` instructions dependent on each other, that would unnecessarily create extra latency. – Peter Cordes Apr 10 '19 at 02:48

Gem Taylor · Answer 2 · 2019-04-09T18:57:11.990

For just 1 extra cycle you can just negate the /output/.

Internally in 8086, the comparison operations only exist in the flags. Getting the value of the flags into a variable takes extra code. It is pretty much the same code whether you want true as 1 or -1. Generally a compiler doesn't actually generate the value 0 or 1 when evaluating an if statement, but uses the Jcc instructions directly on the flags generated by comparison operations. https://pdos.csail.mit.edu/6.828/2006/readings/i386/Jcc.htm

With 80386, SETcc was added, which only ever sets 0 or 1 as the answer, so that is the preferred arrangement if the code insists on storing the answer. https://pdos.csail.mit.edu/6.828/2006/readings/i386/SETcc.htm

And there are lots of new compare instructions that save results to registers going forward. The flags have been seen as a bottleneck for instruction pipeline stalls in modern processors, and very much are disfavoured by code optimisation.

Of course there are all sorts of tricks you can do to get 0, 1, or -1 given a particular set of values to compare. Needless to say the compiler has been optimised to generate 1 for true when applying these tricks, and wherever possible, it doesn't actually store the value at all, but just reorganises your code to avoid it.

@rcgldr But that doesn't generate -1 for true, it generates true for false. — Gem Taylor, Apr 09 '19 at 18:56
@rcgldr but the assembler instruction SETcc doesn't have the option for what value to set - it can only set 1 for true, never -1. — Gem Taylor, Apr 10 '19 at 11:32

Sneftel · Accepted Answer · 2019-04-09T16:14:20.373

2

There is no assembly implementation of a C operator. For instance, there is no x86 instruction which compares two arguments and results in a 0 or 1, only one which compares two arguments and puts the result in a bit in the flag register. And that's not usually what happens when you use ==.

Example:

void foo(int a, int b) {
    if(a == b) { blah(); }
}

produces the following assembly, more or less:

foo(int, int):
        cmp     %edi, %esi
        je      .L12
        rep  ret
.L12:
        jmp     blah()

Note that nothing in there involves a 0/1 value. If you want that, you have to really ask for it:

int bar(int a, int b) {
    return a == b;
}

which becomes:

bar(int, int):
        xor     %eax, %eax
        cmp     %edi, %esi
        sete    %al
        ret

I suspect the existence of the SETcc instructions is what prompted your question, since they convert flag register bits into 0/1 values. There is no corresponding instruction which converts them into 0/~0: GCC instead does a clever little DEC to map them. But in general, the result of == exists only as an abstract and optimizer-determined difference in machine state between the two.

Incidentally, I would not be surprised at all if some x86 implementations chose to fuse SETcc and a following DEC into a single micro-op; I know this is done with other common instruction pairs. There is no simple relationship between a stream of instructions and a number of cycles.

edited Apr 09 '19 at 16:14

answered Apr 09 '19 at 16:08

Sneftel

40,271
12
71
104

Yes, you really got the point, Sneftel! This sentence answers my question completely: “I suspect the existence of the `SETcc` instructions is what prompted your question, since they convert flag register bits into 0/1 values. There is no corresponding instruction which converts them into 0/~0”. I know that the result of a comparison disappears most of the times, but in a context where you store the results of comparisons into bitmasks it certainly doesn't! – madmurphy Apr 09 '19 at 16:18
1

@madmurphy Careful there. While there's no equivalent to `SETcc` that sets all the bits, in SSE-land (which is where you'll be spending most of your time if you actually care about optimization) different techniques are used, and 0/~0 is likely to be faster than 0/1. That's what I mean about a "difference in machine state". See https://godbolt.org/z/9xC-sF . Note the `vpsrld` instructions in the 0/1 case, which right-shifts ~0 to make it into a 1. – Sneftel Apr 09 '19 at 16:31
Thanks again, Sneftel. Despite being quite good with C, I am really a complete foreigner in the assembly world. I would ask you this question: If you had to design your own language from scratch, would you include such an operator – let's call it “Charlie's Equality” (and of course you will have to include all the other “Charlie's Logical AND”, “Charlie's Logical OR”, and so on…)? In cases where `0`/`~0` is what is really needed, do you think that having implemented such operator(s) would benefit a language and its optimizability? – madmurphy Apr 09 '19 at 16:41
1

If I were implementing a language, `==` would produce a boolean result, which would not have a canonical integer value. (In other words, I'd do the same thing as virtually every language designed in the last 20 years.) Since the conversion from a boolean to an integer mapping would not be a significant factor in the performance of any real code, I wouldn't give its low-level consequences much thought. – Sneftel Apr 09 '19 at 16:45
My question wasn't about replacing the == operator (about which I agree with you, but about adding a new (set of) operator(s). The == operator would continue to exist in this hypothetical new language. It's a theoretical question, so you shouldn't use “practicality” as a parameter to consider. Also this operator doesn't “return” booleans, it “returns” numbers (think of it as half-belonging to the family of bitwise operators – although *it is* a logical one). – madmurphy Apr 09 '19 at 16:56
No, I wouldn't have such an operator. It would just be syntactic sugar for `a == b ? X : Y` for appropriate X and Y, and wouldn't be used often enough to make it worthwhile. And again, optimization considerations wouldn't come into play there. – Sneftel Apr 09 '19 at 17:03
@Sneftel fun fact: in Ruby, there's something called `object_id`. For normal objects it appears to be a pointer, for small integers it's 2x+1. `false` has an object ID of 0, and `true` has an object ID of 20. For `nil`, it's 8. All implementation dependent. – John Dvorak Apr 09 '19 at 17:20
@Sneftel I wouldn't underestimate sugar! – madmurphy Apr 09 '19 at 17:53
2

No x86 ISAs do any fusion of setcc. compare-and-branch makes up maybe 10 to 20% of the total instruction count for many programs, according to some sources. That's the only case of macro-fusion that any CPUs implement. (Actually SnB-family [will fuse add/sub, inc/dec, or AND with a JCC](https://stackoverflow.com/questions/31771526/x86-64-assembly-loop-conditions-and-out-of-order), but others are limited to TEST and CMP). setcc/dec is much rarer and definitely not worth fusing; real code almost always uses CMOV instead of creating a 0 / -1 mask and using it. – Peter Cordes Apr 10 '19 at 02:39
1

@madmurphy: Fun fact, there is an undocumented 8086 SALC instruction (set AL from carry) that's still supported in 16/32-bit mode by modern Intel (and AMD?) CPUs. It's basically like `sbb al,al` without setting flags: `AL = CF ? -1 : 0`. http://www.os2museum.com/wp/undocumented-8086-opcodes/. AMD CPUs even recognize `sbb same,same` as independent of the old value of the register, only dependent on CF, like xor-zeroing. On other CPUs, `sbb edx,edx` does give you a 0 / -1 but with a false dependency on the old value of EDX. – Peter Cordes Apr 10 '19 at 02:44
@PeterCordes That's really interesting. As a matter of fact *I am* thinking about how to design a new language, that learns from C, but adds to this a strong compile-time constant evaluation ability (the strongest ever done), loop unrolling facilities, a lot of low-level sugar operators, plus some other nice ideas. Basically the purpose is to remain as low-level as possible, giving the developer even more control than C. The only abstraction higher-level than C I could think of in this new language is some support for objects' methods (although implemented slightly differently than in C++). – madmurphy Apr 10 '19 at 03:17

Peter Cordes · Answer 4 · 2019-04-10T03:43:11.657

SIMD vector comparisons do produce vectors of 0 / -1 results. This is the case on x86 MMX/SSE/AVX, ARM NEON, PowerPC Altivec, etc. (They're 2's complement machines, so I like to write -1 instead of ~0 to represent the elements of all-zero / all-one bits).

e.g. pcmpeqd xmm0, xmm1 replaces each element of xmm0 with xmm0[i] == xmm1[i] ? -1 : 0;

This lets you use them as AND masks, because SIMD code can't branch separately on each vector element without unpacking to scalar and back. It has to be branchless. How to use if condition in intrinsics

e.g. to blend 2 vectors based on a condition, without SSE4.1 pblendvb / blendvps, you'd compare and then AND / ANDNOT / OR. e.g. from Substitute a byte with another one

    __m128i mask = _mm_cmpeq_epi8(inp, val);     // movdqa xmm1, xmm0 / PCMPEQB xmm1, xmm2

    // zero elements in the original where there was a match (that we want to replace)
    inp = _mm_andnot_si128(mask, inp);   // inp &= ~mask;  // PANDN xmm0, xmm1

    //  zero elements where we keep the original
    __m128i tmp = _mm_and_si128(newvals, mask);   // newvals & mask; // PAND xmm3, xmm1

    inp = _mm_or_si128(inp, tmp);             // POR xmm0, xmm1

But if you want to count matches, you can subtract the compare result. total -= -1 avoids having to negate the vector elements. How to count character occurrences using SIMD

Or to conditionally add something, instead of actually blending, just do total += (x & mask), because 0 is the identity element for operations like ADD (and some others like XOR and OR).

See How to access a char array and change lower case letters to upper case, and vice versa and Convert a String In C++ To Upper Case for examples in C with intrinsics and x86 asm.

All of this has nothing to do with C operators and implicit conversion from boolean to integer.

In C and C++, operators return a boolean true/false condition, which in asm for most machines for scalar code (not auto-vectorized) maps to a bit in a flag register.

Converting that to an integer in a register is a totally separate thing.

But fun fact: MIPS doesn't have a flags register: it has some compare-and-branch instructions for simple conditions like reg == reg or reg != reg (beq and bne). And branch on less-than-zero (branch on the sign bit of one register): bltz $reg, target.

(And an architectural $zero register that always reads as zero, so you can use that implement branch if reg !=0 or reg == 0).

For more complex conditions, you use slt (set on less-than) or sltu (set on less-than-unsigned) to compare into an integer register. Like slt $t4, $t1, $t0 implements t4 = t1 < t0, producing a 0 or 1. Then you can branch on that being 0 or not, or combine multiple conditions with boolean AND / OR before branching on that. If one of your inputs is an actual bool that's already 0 or 1, it can be optimized into this without an slt.

Incomplete instruction listing of classic MIPS instructions (not including pseudo-instructions like blt that assemble to slt into $at + bne: http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html

But MIPS32r6 / MIPS64r6 changed this: instructions generating truth values now generate all zeroes or all ones instead of just clearing/setting the 0-bit, according to https://en.wikipedia.org/wiki/MIPS_architecture#MIPS32/MIPS64_Release_6. MIPS32/64 r6 is not binary compatible with previous MIPS ISAs, it also rearranged some opcodes. And because of this change, not even asm source compatible! But it's a definite change for the better.

Fun fact, there is an undocumented 8086 SALC instruction (set AL from carry) that's still supported in 16/32-bit mode by modern Intel (and AMD?) CPUs.

It's basically like sbb al,al without setting flags: AL = CF ? -1 : 0. http://os2museum.com/wp/undocumented-8086-opcodes.

Subtract-with-borrow with the same input twice does x-x - CF on x86, where CF is a borrow for subtraction. And x-x is of course always zero. (On some other ISAs, like ARM, the carry flag meaning is opposite for subtraction, C set means "no borrow".)

In general, you can do sbb edx,edx (or any register you want) to convert CF into a 0 / -1 integer. But this only works for CF; the carry flag is special and there's nothing equivalent for other flags.

Some AMD CPUs even recognize sbb same,same as independent of the old value of the register, only dependent on CF, like xor-zeroing. On other CPUs it still has the same architectural effect, but with a microarchitectural false dependency on the old value of EDX.

Imagining two types of boolean pairs within the same language without further ado

4 Answers4