Reading CF, PF, ZF, SF, OF

Question

I am writing a virtual machine for my own assembly language, I want to be able to set the carry, parity, zero, sign and overflowflags as they are set in the x86-64 architecture, when I perform operations such as addition.

Notes:

I am using Microsoft Visual C++ 2015 & Intel C++ Compiler 16.0
I am compiling as a Win64 application.
My virtual machine (currently) only does arithmetic on 8-bit integers
I'm not (currently) interested in any other flags (e.g. AF)

My current solution is using the following function:

void update_flags(uint16_t input)
{
    Registers::flags.carry = (input > UINT8_MAX);
    Registers::flags.zero = (input == 0);
    Registers::flags.sign = (input < 0);
    Registers::flags.overflow = (int16_t(input) > INT8_MAX || int16_t(input) < INT8_MIN);

    // I am assuming that overflow is handled by trunctation
    uint8_t input8 = uint8_t(input);
    // The parity flag
    int ones = 0;
    for (int i = 0; i < 8; ++i)
        if (input8 & (1 << i) != 0) ++ones;

    Registers::flags.parity = (ones % 2 == 0);
}

Which for addition, I would use as follows:

uint8_t a, b;
update_flags(uint16_t(a) + uint16_t(b));
uint8_t c = a + b;

EDIT: To clarify, I want to know if there is a more efficient/neat way of doing this (such as by accessing RFLAGS directly) Also my code may not work for other operations (e.g. multiplication)

EDIT 2 I have updated my code now to this:

void update_flags(uint32_t result)
{
    Registers::flags.carry = (result > UINT8_MAX);
    Registers::flags.zero = (result == 0);
    Registers::flags.sign = (int32_t(result) < 0);
    Registers::flags.overflow = (int32_t(result) > INT8_MAX || int32_t(result) < INT8_MIN);
    Registers::flags.parity = (_mm_popcnt_u32(uint8_t(result)) % 2 == 0);
}

One more question, will my code for the carry flag work properly?, I also want it to be set correctly for "borrows" that occur during subtraction.

Note: The assembly language I am virtualising is of my own design, meant to be simple and based of Intel's implementation of x86-64 (i.e. Intel64), and so I would like these flags to behave in mostly the same way.

What exactly is the problem you're experiencing? Does your code not work? — ApproachingDarknessFish, Mar 26 '16 at 05:38
`input < 0` will never be true because `input` is unsigned and OF will be dependent on the operands, not only the result. For example, 8-bit operation `0x7f + 0x02 = 0x81` will result in `OF = 1`, but `0x82 + 0xff = 0x81` will result in `OF = 0`. This code is wrong, so some code that sets flags correctly is more neat than this way. — MikeCAT, Mar 26 '16 at 06:43

score 1 · Answer 1 · edited May 23 '17 at 12:15

1

TL:DR: use lazy flag evaluation, see below.

input is a weird name. Most ISAs update flags based on the result of an operation, not the inputs. You're looking at the 16bit result of an 8bit operation, which is an interesting approach. In the C, you should just use unsigned int, which is guaranteed to be at least uint16_t. It will compile to better code on x86, where unsigned is 32bit. 16bit ops take an extra prefix and can lead to partial-register slowdowns.

That might help with the 8bx8b->16b mul problem you noted, depending on how you want to define the flag-updating for the mul instruction in the architecture you're emulating.

I don't think your overflow detection is correct. See this tutorial linked from the x86 tag wiki for how it's done.

This will probably not compile to very fast code, especially the parity flag. Do you need the ISA you're emulating/designing to have a parity flag? You never said you're emulating an x86, so I assume it's some toy architecture you're designing yourself.

An efficient emulator (esp. one that needs to support a parity flag) would probably benefit a lot from some kind of lazy flag evaluation. Save a value that you can compute flags from if needed, but don't actually compute anything until you get to an instruction that reads flags. Most instructions only write flags without reading them, and they just save the uint16_t result into your architectural state. Flag-reading instructions can either compute just the flag they need from that saved uint16_t, or compute all of them and store that somehow.

Assuming you can't get the compiler to actually read PF from the result, you might try _mm_popcnt_u32((uint8_t)x) & 1. Or, horizontally XOR all the bits together:

x  = (x&0b00001111) ^ (x>>4)
x  = (x&0b00000011) ^ (x>>2)
PF = (x&0b00000001) ^ (x>>1)   // tweaking this to produce better asm is probably possible

I doubt any of the major compilers can peephole-optimize a bunch of checks on a result into LAHF + SETO al, or a PUSHF. Compilers can be led into using a flag condition to detect integer overflow to implement saturating addition, for example. But having it figure out that you want all the flags, and actually use LAHF instead of a series of setcc instruction, is probably not possible. The compiler would need a pattern-recognizer for when it can use LAHF, and probably nobody's implemented that because the use-cases are so vanishingly rare.

There's no C/C++ way to directly access flag results of an operation, which makes C a poor choice for implementing something like this. IDK if any other languages do have flag results, other than asm.

I expect you could gain a lot of performance by writing parts of the emulation in asm, but that would be platform-specific. More importantly, it's a lot more work.

edited May 23 '17 at 12:15

Community

1
1

answered Mar 26 '16 at 06:32

Peter Cordes

328,167
45
605
847

Thank you for that, I will look in to _mm_popcnt_u32. And the name of the argument "input" was a bad choice.. A couple of follow on questions: What is wrong with my overflow checking? I based it on what the Intel's architecture manual said "Overflow flag — Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic." Also your XOR'ing thing, how does that work? – Isaac Mar 26 '16 at 08:06
Also lazy evaluation would probably end up being less efficient, and certainly more complicated given my architectures design. – Isaac Mar 26 '16 at 08:21
@Isaac: parity *is* the xor of all bits. XOR is add without carry. My suggested method is like a SIMD horizontal sum, but bitwise and using XOR. – Peter Cordes Mar 26 '16 at 09:00
@Isaac: lazy eval is really easy to implement: instead of calling the flag-update function in the implementation of every operation that updates it, just store the `uint16_t`. Use accessor methods for the flags. Anywhere you read `Registers::flags.carry`, instead read `Registers::flags.carry()`. – Peter Cordes Mar 26 '16 at 09:04
@Isaac: MikeCAT's comment on your question explains the problem with your overflow flag. The problem is that the `int16_t` you get from casting your `uint16_t` will never be negative. The upper 7 bits (above the possible 9 bit result of adding two 8bit numbers) are zero-extended, not sign-extended. – Peter Cordes Mar 26 '16 at 09:07
Thank you, Would update_flags(uint32(int32(a) + int32(b))) work? Also the problem with lazy-evaluation is that the flags are simply stored in a block of memory, and instructions can access it like any other block of memory, they can even access a contiguous stream of memory which contains the flags 'register' somewhere. – Isaac Mar 26 '16 at 09:18
@Isaac: you're writing this in c++, so it's extremely easy to change your design to use accessor functions, like I suggested. Any other way of implementing lazy flag eval is fine, assuming you're emulating an x86-like ISA where flag writing is much more common than flag reading. If you want it to be fast, writing the emulator in asm (or a high-level language with flags) is probably your only other option. At least use lazy eval for `PF`, since it's almost never used. IDK whether your `int32` idea would work. Think it through yourself for an overflow and non-overflow input. – Peter Cordes Mar 26 '16 at 09:32
1

If you're serious about writing an emulator, [this paper](http://www.emulators.com/docs/LazyOverflowDetect_Final.pdf) describes how Bochs and other high-performance emulators handle flags. – Raymond Chen Mar 26 '16 at 14:15
@RaymondChen: Thanks for that link. I didn't know real emulators used lazy flag eval, but I wasn't surprised that I wasn't the first to think of it. :P Interesting that they can store just the result and the carry-out (which the OP does by doing operations at double width, but that isn't viable for emulating a 32bit CPU on a 32bit CPU.) – Peter Cordes Mar 26 '16 at 14:42
1

Faster than saving the result and carry out for lazy evaluation of flags is not storing anything at all! - that is what JIT emulators (https://www.wikiwand.com/en/Just-in-time_compilation) often do, they do liveness (data-flow) analysis (https://www.wikiwand.com/en/Live_variable_analysis) to determine which flags might be needed, and do lazy evaluation only for those. For example, an x86 ADD followed by CMP, the flags generated by ADD are overwritten by CMP... so the ADD doesn't need to do anything with the flags. Here's a 28 year old patent on this http://www.google.com/patents/US4951195 – amdn Mar 28 '16 at 08:33

score 0 · Accepted Answer · answered Mar 26 '16 at 12:48

0

I appear to have solved the problem, by splitting the arguments to update flags into an unsigned and signed result as follows:

void update_flags(int16_t unsigned_result, int16_t signed_result)
{
    Registers::flags.zero = unsigned_result == 0;
    Registers::flags.sign = signed_result < 0;
    Registers::flags.carry = unsigned_result < 0 || unsigned_result > UINT8_MAX;
    Registers::flags.overflow = signed_result < INT8_MIN || signed_result > INT8_MAX
}

For addition (which should produce the correct result for both signed & unsigned inputs) I would do the following:

int8_t a, b;
int16_t signed_result = int16_t(a) + int16_t(b);
int16_t unsigned_result = int16_t(uint8_t(a)) + int16_t(uint8_t(b));
update_flags(unsigned_result, signed_result);
int8_t c = a + b;

And signed multiplication I would do the following:

int8_t a, b;
int16_t result = int16_t(a) * int16_t(b);
update_flags(result, result);
int8_t c = a * b;

And so on for the other operations that update the flags

Note: I am assuming here that int16_t(a) sign extends, and int16_t(uint8_t(a)) zero extends.

I have also decided against having a parity flag, my _mm_popcnt_u32 solution should work if I change my mind later..

P.S. Thank you to everyone who responded, it was very helpful. Also if anyone can spot any mistakes in my code, that would be appreciated.

answered Mar 26 '16 at 12:48

Isaac

816
5
12

When two negative int8_t values are added such that the sum cannot be represented (signed_result < INT8_MIN) there is overflow, and the code correctly sets OF=1 but then it indicates the result is negative, SF=1, which is not the case in the x86-64 architecture. In x86 SF reflects the most significant bit of the result. For example, if a=-120 and b=-9 the result of the 8-bit addition will be 127, a positive number (SF=0). – amdn Mar 28 '16 at 07:51
As this interpreter is one for my own assembly language, I have defined the SF to indicate the sign of the 'mathematicly correct' result (assuming the input operands are signed numbers), not the sign of the result when represented in 8-bits. (I am aware that in x86, the SF is merely the most significant bit of the result, when truncated to the destination size) – Isaac Mar 28 '16 at 09:01
1

That is not what your first sentence in the question says "I want to be able to set the carry, parity, zero, sign and overflow flags as they are set in the x86-64 architecture, when I perform operations such as addition." – amdn Mar 28 '16 at 09:40
1

@amdn: I found this question unclear, too. I wasn't sure whether the OP was inventing a new ISA or not, given that sentence. So I wasn't sure if my suggestion to omit PF entirely was useful (since it's expensive to compute in C). – Peter Cordes Mar 28 '16 at 09:46
Is the Zero flag after addition set to true if and only if both operands are zero? – amdn Mar 28 '16 at 12:18
@amdn Hmm good point, well my ISA sets it to 1 if and only if the mathematically correct result is 0, for instructions such as ADD and SUB which operate the same for signed and unsigned numbers it is the mathematical result if the operands are assumed to be unsigned. So to answer your question, yes for ADD and MUL instructions, but not no for others. (perhaps I should instead have separate signed & unsigned versions of ADD and SUB?) – Isaac Mar 28 '16 at 12:49
1

Consider this C function: `int8_t div( int8_t a, int8_t b, int8_t c ) { int8_t sum = b + c; if (sum == 0) return 0; else return a / sum; }` Now, what should a compiler generate for this function with your ISA as the target? – amdn Mar 28 '16 at 14:10
@amdn Technically according to the c-standard, if `b + c` does not fit into an `int8_t` It's undefined behaviour, but I assume you wan't me to simply truncacte, in that case: `Pop r3; r3 = c; Pop r2; r2 = b; Pop r1; r1 = a; Add r2, r3; b += c; Compare r2, 0; This updates the flags as if computing 'r2 - 0'; If-NotEqual; If the zero flag is not set; Jump LABEL1;; Push 0;; Return; I.e. 'return 0'; LABEL: Divide r1, r2; r1 /= r2; Push r1;; Return; Return r1;` Note: Comments start with a `;` and end at the end of a line, or another `;` – Isaac Mar 28 '16 at 14:25
I think you meant implementation-defined behavior, not undefined behavior. The expression `b + c` is evaluated by first promoting both operands to integer (guaranteed to be at least 16 bits) and then converting to int8_t to perform the assignment to `sum`. No overflow is possible in the addition, and the conversion from int to int8_t is implementation-defined. – amdn Mar 28 '16 at 23:57
@amdn: actually signed overflow really is undefined behaviour. Compilers can make significantly better code by assuming it doesn't happen. (e.g. pretend a loop counter is a full register wide, so it can be used for array indexing.) This blog post explains in more detail: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html – Peter Cordes Mar 30 '16 at 18:47
1

@PeterCordes: in general overflow can happen during conversion (implementation defined behavior) or during evaluation (undefined behavior). In this case we first have two conversions (the promotion to `int` of the int8_t addition operands) and there is obviously no overflow there. Then the sum is performed, no overflow because the sum of two 8-bit signed quantities fits in 9 bits and an `int` is guaranteed to be at least 16 bits. The last conversion is the assignment of the result of the sum (an int) to int8_t - that conversion might overflow, and behavior is implementation defined. – amdn Mar 30 '16 at 22:02
1

@amdn: oops I was forgetting about integer promotion to `int` even when both operands are the same type, and skimmed over that part of your earlier comment, xD. Agreed on your conclusion. – Peter Cordes Mar 30 '16 at 23:56
1

@PeterCordes integer promotion and other implied conversions and what is implementation defined vs undefined is an area where I have to keep going back to the standard... every once in a while I still go *what?* like here (where I was definitely confused): http://stackoverflow.com/questions/24795651/whats-the-best-c-way-to-multiply-unsigned-integers-modularly-safely#comment38509572_24795651 – amdn Mar 31 '16 at 00:12

Reading CF, PF, ZF, SF, OF

2 Answers2

Linked