What are the technical reasons behind the "Itanium fiasco", if any?

Question

In this article Jonh Dvorak calls Itanium "one of the great fiascos of the last 50 years". While he describes the over-optimistic market expectations and the dramatic financial outcome of the idea, he doesn't go into the technical details of this epic fail. I had my chance to work with Itanium for some period of time and I personally loved its architecture, it was so clear and simple and straightforward in comparison to the modern x86 processors architecture...

So then what are/were the technical reasons of its failure? Under-performance? Incompatibility with x86 code? Complexity of compilers? Why did this "Itanic" sink?

Itanium processor block

In my opinion it is very "programming-related", because whatever we program gets executed by that processor-thingie inside the machines. So you have to know how and why it works at least a little. — Massimiliano, Jun 18 '09 at 09:49
Not really. The entire PC is needed for programming, yet PC configuration questions belong on serverfault.com and not stackoverflow.com. — OregonGhost, Jun 18 '09 at 09:50
Processor architecture as a lot to do with programming. I learned a lot about OS reading the ARM reference manual. — shodanex, Jun 18 '09 at 09:53
And what's the problem with the Itanium tag???? Is it exactly for the topic, is it a non-programming-related tag, or what? — Massimiliano, Jun 18 '09 at 10:06
this is really programming related - just because it mentions hardware does not make it server fault material — 1800 INFORMATION, Jun 18 '09 at 10:07
@Yacoder: No, it isn't, but you said that questions about processor-related things are programming-related because you need a processor to execute programs. I said you also need the rest of the PC to execute programs, yet most PC-related questions are closed because they are not programming related. So what exactly makes this any more programming-related than any other question about PC components? — OregonGhost, Jun 18 '09 at 10:28
Closed again. And downvoted. Why?? "True" programmers don't need to know the architecture of the machines executing their codes??? Sad. — Massimiliano, Dec 12 '12 at 05:04
[Why was the Itanium processor difficult to write a compiler for?](https://softwareengineering.stackexchange.com/q/279334/98103) — phuclv, Jun 30 '18 at 15:19

score 32 · Accepted Answer · edited May 21 '19 at 18:47

Itanium failed because VLIW for today's workloads is simply an awful idea.

Donald Knuth, a widely respected computer scientist, said in a 2008 interview that "the "Itanium" approach [was] supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write."¹

That pretty much nails the problem.

For scientific computation, where you get at least a few dozens of instructions per basic block, VLIW probably works fine. There's enough instructions there to create good bundles. For more modern workloads, where oftentimes you get about 6-7 instructions per basic block, it simply doesn't (that's the average, IIRC, for SPEC2000). The compiler simply can't find independent instructions to put in the bundles.

Modern x86 processors, with the exception of Intel Atom (pre Silvermont) and I believe AMD E-3**/4**, are all out-of-order processors. They maintain a dynamic instruction window of roughly 100 instructions, and within that window they execute instructions whenever their inputs become ready. If multiple instructions are ready to go and they don't compete for resources, they go together in the same cycle.

So how is this different from VLIW? The first key difference between VLIW and out-of-order is that the the out-of-order processor can choose instructions from different basic blocks to execute at the same time. Those instructions are executed speculatively anyway (based on branch prediction, primarily). The second key difference is that out-of-order processors determine these schedules dynamically (i.e., each dynamic instruction is scheduled independently; the VLIW compiler operates on static instructions).

The third key difference is that implementations of out-of-order processors can be as wide as wanted, without changing the instruction set (Intel Core has 5 execution ports, other processors have 4, etc). VLIW machines can and do execute multiple bundles at once (if they don't conflict). For example, early Itanium CPUs execute up to 2 VLIW bundles per clock cycle, 6 instructions, with later designs (2011's Poulson and later) running up to 4 bundles = 12 instructions per clock, with SMT to take those instructions from multiple threads. In that respect, real Itanium hardware is like a traditional in-order superscalar design (like P5 Pentium or Atom), but with more / better ways for the compiler to expose instruction-level parallelism to the hardware (in theory, if it can find enough, which is the problem).

Performance-wise with similar specs (caches, cores, etc) they just beat the crap out of Itanium.

So why would one buy an Itanium now? Well, the only reason really is HP-UX. If you want to run HP-UX, that's the way to do it...

Many compiler writers don't see it this way - they always liked the fact that Itanium gives them more to do, puts them back in control, etc. But they won't admit how miserably it failed.

Footnote 1:

This was part of a response about the value of multi-core processors. Knuth was saying parallel processing is hard to take advantage of; finding and exposing fine-grained instruction-level parallelism (and explicit speculation: EPIC) at compile time for a VLIW is also a hard problem, and somewhat related to finding coarse-grained parallelism to split a sequential program or function into multiple threads to automatically take advantage of multiple cores.

11 years later he's still basically right: per-thread performance is still very important for most non-server software, and something that CPU vendors focus on because many cores is no substitute.

Thanks. A great answer! The question waited for you so long :-) As for the quote, I believe it is from Donald Knuth: http://www.informit.com/articles/article.aspx?p=1193856 — Massimiliano, Dec 12 '12 at 05:02
Why has noone made an architecture where instructions carry additional info (about dependencies, etc) to make out-of-order easier/cheaper? Sort of the best out of both approaches. — Aleksandr Dubinsky, Jan 02 '13 at 10:33
Aleksandr, there are multiple parts to the answer. It is I guess technically possible to enhance out-of-order execution this way, though I'm not aware of solid approaches. Several issues: a) add something to the instruction set, and you need to support it even if it makes no sense anymore (e.g., delayed branch slots). b) dynamic predictors tend to do a good job (e.g., store-load dependency precition) and apply to all code, retroactively too. c) you need some significant improvements to justify an instruction set change like this. — Vlad Petric, Feb 26 '13 at 22:18
Aleksandr, as an aside, dataflow architectures have all dependencies explicit. Having all dependencies explicit, however, restricts your programming (no regular memory). Hybrids between von-Neumann and dataflow do exist (Wavescalar) — Vlad Petric, Feb 26 '13 at 22:23

score 8 · Answer 2 · answered Jun 18 '09 at 09:49

8

Simple. It wasn't x86 compatible. That's why x86_64 chips are.

answered Jun 18 '09 at 09:49

SpliFF

38,186
16
91
120

3

Well, PowerPC chips are not x86 compatible, but they aren't a fiasco, at least in High Performance Computing. So there must be a better explanation... – Massimiliano Jun 18 '09 at 10:09
3

No. PowerPC worked because Apple worked very hard to provide an emulation layer to 68000. Same again when they moved to Core Duo. At each change a large percentage of existing software continued to run. No existing software ran on itanium which was entirely the cause of its downfall. – SpliFF Jun 18 '09 at 10:23
4

Let me put it another way. At the time of release software developers were waiting for a decent marketshare before writing software for it and PC buyers were waiting for a decent amount of software before buying. – SpliFF Jun 18 '09 at 10:29
It's valid. But still, the market share for Itaniums in HPC was growing for some period. So this initial problem of "chicken and egg" seemed to be solved. http://en.wikipedia.org/wiki/File:Top500.procfamily.png – Massimiliano Jun 18 '09 at 12:09
1

Windows on Itanium has a WoW layer to run x86 applications. Many versions of Itanium even has a small x86 CPU inside to run x86 code – phuclv Jun 30 '18 at 15:06

score 5 · Answer 3 · answered Apr 19 '11 at 23:02

Itanium designed rested on the philosophy of very wide instruction level parallelism to scale performance of a processor when clock frequency limit is imposed due to thermal constraints.

But AMD Opteron DISRUPTED Itanium adoption by PROLIFERATING x86_64 cores to achieve scalable performance and also being compatible with 32bit x86 binaries.

Itanium servers are 10x expensive than x86 for similar processor count.

All these above factors slowed adoption of Itanium servers for the mainstream market. Itanium's main market now is a mission critical enterprise computing which is a good $10B+/year market dominated only by HP, IBM and Sun.

score 3 · Answer 4 · answered Jun 18 '09 at 09:46

It was very hard to write code generators for; and it didn't have much reasons to succeed in the first place (It was made by Intel, so what?).

I've heard some JITs gave worse perfomance than interpreters on Itanium because gcc optimized interpreter better; that's a no-go if a processor requires that level of optimizations.

Non-mainstream RISCs are losing grounds; They didn't see that or hoped it would become mainstream; too bad it wouldn't because there weren't any reasons for that.

Ken · Answer 5 · 2009-09-05T00:49:01.837

I read that article, and I'm completely missing the "fiasco" he refers to. As he mentions near the end, at the mere sight of Itanium, "one promising project after another was dropped". MIPS, Alpha, PA-RISC -- gone. Sun has cancelled their last two big Sparc projects, though it wasn't exactly a big seller even before those. PowerPC is only surviving in the embedded space.

How is Intel killing off all the competition, using a single product line, anything but the greatest microprocessor victory of all time? I'm sure they weren't smart enough to have anticipated this, but even if they knew it would fail, throwing a few $billion at a feint worked wonderfully. Apparently they could afford it, and everybody else just dropped dead.

On the desktop, in the server room, and even in supercomputers (87% of the top-500 list), it's x86-compatible as far as the eye can see. If that's the result of an Intel "fiasco", then what words are left for the processors that didn't make it?

Intel and Itanium, in my book, ranks up there with Microsoft and MS-DOS: despite how lousy it may have been technically, it enabled them to utterly dominate the industry.

EDIT: And Itanium had x86 compatibility from day 1, so that's not it. It was slow, but it was there.

AFAIR, he wasn't talking about Intel's fiasco, only about the "Itanium project" fiasco... — Massimiliano, Sep 07 '09 at 09:50
Would you call MS-DOS a fiasco, then? It was also an accident involving a technically inferior product that led directly to a huge monopoly for years. — Ken, Sep 08 '09 at 19:00

score 0 · Answer 6 · answered Sep 04 '09 at 23:34

I think Itanium still has its market - high end systems and HP blade servers. Performance is still much higher compared to x86. I'm not sure why would some one call it a failure when it is generating billions of $ for HP (although it is not just the processor; it is itanium server sales that is generating revenue).

What are the technical reasons behind the "Itanium fiasco", if any?

6 Answers6