Many forum threads over at http://realworldtech.com/ have debated how much the "x86 tax" costs x86 CPUs in terms of transistor count / performance / power, vs. a simple-to-decode ISA like MIPS.
10% is a number that has been thrown around as a wild guess. Some of that cost is fixed, and doesn't scale as you make the CPU more powerful. e.g. it takes maybe 3 extra pipeline stages to decode x86 instructions into a stream of uops that are similar in complexity to separate MIPS instructions. An ADD with a memory destination might decode into a load, ADD, and store. (micro-fusion in some parts of the pipeline makes it more complicated than that.)
Decoding variable-length x86 instructions is very power-intensive to do in parallel (up to 4 per clock in current designs). x86 isn't just variable-length, determining the length (i.e. the start of the next instruction) requires looking at a lot of bits because there are optional prefixes and various other complexities. Agner Fog's blog post about the "instruction set war" between Intel and AMD discusses some of the costs of the messy state of x86 opcode coding-space. (See also his microarch pdf to learn about the pipelines in modern x86 designs from AMD and Intel, aimed at finding bottlenecks in real code / understanding performance counters, but also interesting if you're just curious how CPUs work).
The cost of decoding x86 instructions is so high that Intel's Sandybridge microarchitecture family uses a small/fast decoded-uop cache as well as a tradition L1 I-cache. Even large loops usually fit in the uop cache, saving power and increasing front-end throughput vs. running from the legacy decoders. Most other ISAs can't get nearly as much benefit from a decoded-instruction cache, so they don't use them. (Intel previous experimented with a decoded-uop trace cache in Pentium 4 (without a L1 I-cache, and with weaker decoders), but SnB's uop cache is not a trace cache and the legacy decoders are still fast enough.)
OTOH, some of x86's legacy baggage (like partial FLAGS updates) imposes a cost on the rest of the pipeline and out-of-order core. Modern x86 CPUs do have to rename different parts of FLAGS separately, to avoid false dependencies in something like DEC / JNZ. (Where DEC doesn't modify CF). Intel experimented with not doing this (in Pentium4 aka the netburst microarchitecture family). They thought they could force everyone to recompile their code with compilers that avoided INC/DEC, and used add eax, 1
(which does modify all the flags). (This optimization advice stuck around for ages in their official optimization manual, long after P4 was obsolete, and many people think it's still relevant.)
Some people argue that x86's strong memory ordering semantics should be considered part of the "x86 tax" that reduces the parallelism a pipelined CPU can exploit, but others (e.g. Linus Torvalds) would argue that having the hardware do it for you means you don't need barrier instructions all over the place in multi-threaded code. And that having the barrier instructions be "cheap" (not a full flush of the store buffer or whatever) requires the hardware to track memory ordering in enough detail that they might as well just make barriers implicit.