You appear to be asking about emulating x86, not virtualizing it. Since modern x86 hardware supports virtualization, where the CPU runs guest code natively and only traps to the hypervisor for some privileged instructions, that's what the term "virtualization" normally means.
Lazy flag evalution is typical. Instead of actually calculating all the flags, just save the operands from the last instruction that set flags. Then if something actually reads the flags, figure out what the flag values need to be.
This means you don't actually have to calculate PF and AF every time they're written (almost every instruction), only every time they're read (mostly only PUSHF or interrupts, hardly any code ever reads PF (except for FP branches where it means NaN)). Computing PF after every integer instruction is expensive in pure C, since it requires a popcount on the low 8 bits of results. (And I think C compilers generally don't manage to recognize that pattern and use setp
themselves, let alone a pushf
or lahf
to store multiple flags, if compiling an x86 emulator to run on an x86 host. They do sometimes recognize population-count patterns and emit popcnt
instructions, though, when targetting host CPUs that have that feature (e.g. -march=nehalem
)).
BOCHS uses this technique, and describes the implementation in in some detail in the Lazy Flags section of this short pdf: How Bochs Works Under the Hood 2nd edition. They save the result so they can derive ZF, SF, and PF, and the carry-out from the high 2 bits for CF and OF, and from bit 3 for AF. With this, they never need to replay an instruction to compute its flag results.
There are extra complications from some instructions not writing all the flags (i.e. partial-flag updates), and presumably from instructions like BSF that set ZF based on the input not the output.
Further reading:
This paper on emulators.com gives a lot of details on how to efficiently save enough state to reconstruct flags. It has a "2.1 Lazy Arithmetic Flags for CPU Emulation".
One of the authors is Darek Mihocka (long time emulator writer, now working at Intel apparently). He has written much interesting stuff about making non-JIT emulators run fast, and CPU performance stuff in general, much of it posted on his site, http://www.emulators.com/. E.g. this article about avoiding branch-misprediction in an emulator's interpreter loop that dispatches to functions that implement each opcode is quite interesting. Darek is also the co-author of that article about BOCHS internals I linked earlier.
A google hit for lazy flag eval may also be relevant: https://silviocesare.wordpress.com/2009/03/08/lazy-eflags-evaluation-and-other-emulator-optimisations/
Last time emulation of x86-like flags came up, the discussion in comments on my lazy-flags answer had some interesting stuff: e.g. @Raymond Chen suggested that link to the Mihocka & Troeger paper, and @amdn pointed out that JIT dynamic translation can produce faster emulation than interpretation.