Is C++ considered a Von Neumann programming language?

Question

The term Von Neumann languages is applied to programming languages whose computational model is based on the Von Neumann computer architecture.

Is C++ considered a Von Neumann language, or if it's not (e.g., because of asynchronous execution with the advent of threads) was it ever considered a Von Neumann language?
Is there an architecture that C++'s computational model/abstract machine is based on and thus can be classified as a language of that architecture?

The wikipedia link you posted states: "Many widely used programming languages such as C, C++ and Java have ceased to be strictly von Neumann by adding support for parallel processing, in the form of threads." — Rotem, Oct 09 '19 at 21:39
@Rotem Kindly, please read the questions below the picture. P.S I've read the article before posting the question. — 101010, Oct 09 '19 at 21:42
C is designed around a register machine with RAM, like a PDP-11. That's why it has local vars, pointers, and array indexing. Most C implementations also depend on a stack for automatic storage, when they need to be spilled from registers and for recursion. But C/C++ have managed to avoid requiring a callstack; other implementations are possible for function context. And BTW, you don't need to specify a single ISA, just the ["model of computation"](https://en.wikipedia.org/wiki/Model_of_computation) / abstract machine type like https://en.wikipedia.org/wiki/Register_machine — Peter Cordes, Oct 09 '19 at 21:46
One can care a great deal about every detail of the language and how to use it, without caring a single iota about the name somebody might use to try to characterize the model of computation they see it as embodying. — Jerry Coffin, Oct 09 '19 at 21:50
@JerryCoffin In my humble opinion names are important. Furthermore, one might wonder, what's the computational model that C++ abstract machine simulates? ;) — 101010, Oct 09 '19 at 21:55
@PeterCordes Exactly to what my question focuses. What's the computational model for C++'s abstract machine if not Von Neumann's. — 101010, Oct 09 '19 at 22:00
@101010: "*In my humble opinion names are important.*" And what makes this particular name important? "*what's the computational model that C++ abstract machine simulates?*" The stuff that inspired you to make a choice 30+ years ago isn't necessarily relevant now that the choice has been made. What matters is the choice, not why it was made. The C++ standard defines how the abstract machine works; *that* is the "computational model". — Nicol Bolas, Oct 09 '19 at 22:02
We complain that all we get these days are basically debugging questions, but when we get an actual interesting question to answer, all we care about is whether the question "matters" somehow? What's your guys' standard on whether the question "matters" or not? — Not a real meerkat, Oct 09 '19 at 22:12
I would say the premise of the question doesn't really make sense. While most modern systems are Von Neumann I see no reason that the typical compiled language (C, C++, whatever) could not be made to run just as readily on a Harvard architecture system. — SoronelHaetir, Oct 09 '19 at 22:13
@101010 I read the text of the question, and my comment merely meant to state that this is specifically addressed in the article for the benefit of other readers. So, I would suggest you clarify what you are after? An opinionated discussion? A more authoritative answer? Regardless I believe you should state why you believe the statement in the article may be misguided. — Rotem, Oct 09 '19 at 22:20
@Rotem What the article states is already incoporated in "or if it's not (e.g., because of asynchronous execution with the advent of threads) was it ever considered a Von Neumann language?". — 101010, Oct 09 '19 at 22:24
@101010 "ceased to be" as written in the article implies that they were considered strictly VN languages, at least by the author of the article. Regardless, I find the question worthy of debate. — Rotem, Oct 09 '19 at 22:29
@CássioRenan: Agreed. This is the first time I've ever *added* the tag `[computer-science]` to a question, rather than removing it from some debugging / engineering question (presumably posted by a student taking a class titled Computer Science) — Peter Cordes, Oct 10 '19 at 04:55
@101010: I was working on a big update to my answer while you accepted it. On more careful reading, I realized that the computer science RAM abstract machine uses the term "registers" for what are really more like memory addresses. It's not like a CPU with registers + RAM. It's more abstract and further from modeling real CPUs than I thought. So there was a lot more ground to cover about what *abstract* models of computation are for, vs. the C++ abstract machine. Also, Von Neumann vs. Harvard is probably red herring here; the dude has his name on so much stuff... see that para in my answer — Peter Cordes, Oct 10 '19 at 09:23
@PeterCordes It looks that I have something for evening reading. Thanxs ;) — 101010, Oct 10 '19 at 09:27

Peter Cordes · Accepted Answer · 2019-10-10T19:22:47.113

TL:DR: The C++ abstract machine is a type of PRAM (Parallel Random Access Machine).

From the Von Neumann Languages Wikipedia article you linked:

Many widely used programming languages such as C, C++ and Java have ceased to be strictly von Neumann by adding support for parallel processing, in the form of threads.

Cease describes a transition from being to not-being. So yes, before C++11 added threads, C++ was strictly a Von Neumann language according to Wikipedia. (And after it's still basically a VN language; having multiple threads sharing the same address-space doesn't fundamentally change how C++ works.)

The interesting parts of being a Von Neumann architecture in this context:

Having addressable RAM at all, allowing efficient access (modulo cache / paging) to any object at any time
Storing the program in RAM: function pointers are possible and efficient, without requiring an interpreter
Having a program counter that steps through instructions in the stored program: The natural model is an imperative programming language that does one thing at a time. This is so fundamental that it's easy to forget it's not the only model! (vs. an FPGA or ASIC or something where all gates potentially do something in parallel every clock cycle. Or a MIMD GPU where a computational "kernel" you write is run over all the data potentially in parallel, without implicit sequencing of what order each element is processed in. Or Computational RAM: put ALUs in the memory chips to bypass the Von Neumann bottleneck)

IDK why the wiki article mentions self-modifying code, though; like most languages, ISO C++ doesn't standardize that and is fully compatible with ahead-of-time compilation for a split-bus / split-address-space Harvard architecture. (No eval or anything else that would require an interpreter or JIT.) Or on a normal CPU (Von Neumann), strict W^X memory protection and never using mprotect to change page permissions from writeable to executable.

Of course most real C++ implementations do provide well-defined ways to write machine-code into a buffer and cast to a function pointer, as extensions. (e.g. GNU C/C++'s __builtin___clear_cache(start, end) is named for I-cache sync, but defined in terms of making it safe to call data as a function wrt. dead-store elimination optimizations as well, so it's possible for code to break without it even on x86 which has coherent I-caches.) So implementations can extend ISO C++ to take advantage of this feature of Von Neumann architectures; ISO C++ is intentionally limited in scope to allow for differences between OSes and stuff like that.

Note that being Von Neumann does not strictly imply supporting indirect addressing modes. Some early CPUs didn't, and self-modifying code (to rewrite an address hard-coded in an instruction) was necessary to implement things that we now use indirection for.

Also note that John Von Neumann was a really famous guy, with his name attached to a lot of fundamental things. Some of the connotations of Von Neumann architecture (as opposed to Harvard) aren't really relevant in all contexts. e.g. the "Von Neumann language" term doesn't so much care about Von Neumann vs. Harvard; It cares about stored-program with a program counter vs. something like Cellular Automata or a Turing machine (with a real tape). Getting extra bandwidth by using a separate bus (or just split caches) to fetch instructions (Harvard) is just a performance optimization, not a fundamental change.

What is an abstract machine model / model of computation anyway?

First of all, there are some models of computation that are weaker than Turing machines, like Finite State Machines. There are also non-sequential models of computation, for example Cellular Automata (Conway's Game of Life), where multiple things happen in parallel at each "step".

The Turing machine is the most widely-known (and mathematically simple) sequential abstract machine that is as "strong" as we know how to make. Without any kind of absolute memory addressing, just relative movement on the tape, it naturally provides infinite storage. This is important, and makes all other kinds of abstract machines very unlike real CPUs in some ways. Remember, these models of computation are used for theoretical computer science, not engineering. Problems like finite amounts of memory or performance aren't relevant to what's computable in theory, only in practice.

If you can compute something on a Turing machine, you can compute it on any other Turing-complete model of computation (by definition), perhaps with a much simpler program or perhaps not. Turing machines aren't very nice to program, or at least very different from assembly language for any real CPU. Most notably, the memory isn't random-access. And they can't easily model parallel computing / algorithms. (If you want to prove things about an algorithm in the abstract, having an implementation of it for an abstract machine of some sort is probably a good thing.)

It's also potentially interesting to prove what features an abstract machine needs to have in order to be Turing complete, so that's another motivation for developing more of them.

There are many others that are equivalent in terms of computability. The RAM machine model is most like real-world CPUs that have an array of memory. But being a simple abstract machine, it doesn't bother with registers. In fact, just to make things more confusing, it calls its memory cells an array of registers. A RAM machine supports indirect addressing, so the correct analogy to real world CPUs is definitely to memory, not CPU registers. (And there are an unbounded number of registers, each of unbounded size. Addresses keep going forever and every "register" needs to able to hold a pointer.) A RAM machine can be Harvard: program stored in a separate finite-state portion of the machine. Think of it like a machine with memory-indirect addressing modes so you can keep "variables" in known locations, and use some of them as pointers to unbounded-size data structures.

The program for an abstract RAM machine looks like assembly language, with load/add/jnz and whatever other selection of instructions you want it to have. The operands can be immediates or register numbers (what normal people would call absolute addresses). Or if the model has an accumulator, then you have a load/store machine with an accumulator a lot more like a real CPU.

If you ever wondered why a "3-address" machine like MIPS was called that instead of 3-operand, it's probably 1. because the instruction encoding needs room / I-fetch bandwidth through the Von Neumann bottleneck for 3 explicit operand locations (register number) and 2. because in a RAM abstract machine, operands are memory addresses = register numbers.

C++ can't be Turing complete: pointers have a finite size.

Of course, C++ has huge differences from a CS abstract machine model: C++ requires every type to have a compile-time-constant finite sizeof, so C++ can't be Turing-complete if you include the infinite-storage requirement. Everything in Is C actually Turing-complete? on cs.SE applies to C++, too: the requirement that types have a fixed width is a showstopper for infinite storage. See also https://en.wikipedia.org/wiki/Random-access_machine#Finite_vs_unbounded

So Computer Science abstract machines are silly, what about the C++ Abstract machine?

They of course have their purposes, but there's a lot more interesting stuff we can say about C++ and what kind of machine it assumes if we get a bit less abstract and also talk about what a machine can do efficiently. Once we talk about finite machine machines and performance, these differences become relevant.

First, to run C++ at all, and second, to run without huge and/or unacceptable performance overheads. (e.g. the HW will need to support pointers fairly directly, probably not with self-modifying code that stores the pointer value into every load/store instruction that uses it. And that wouldn't work in C++11 where threading is part of the language: the same code can be operating on 2 different pointers at once.)

We can look in more detail at the model of computation assumed by the ISO C++ standard, which describes how the language works in terms of what happens on the Abstract Machine. Real implementations are required to run code on real hardware that runs "as-if" the abstract machine were executing C++ source, reproducing any/all observable behaviour (observable by other parts of the program without invoking UB).

C/C++ has memory and pointers, so it's pretty definitely a type of RAM machine.

Or these days, a Parallel random-access machine, adding shared memory to the RAM model, and giving each thread its own program counter. Given that std::atomic<> release-sequences make all previous operations visible to other threads, the "establishing a happens-before relationship" model of synchronization is based around coherent shared memory. Emulating it on top of something that required manual triggering of syncing / flushing would be horrible for performance. (Very clever optimizations may prove when that can be delayed so not every release-store has to suffer, but seq-cst will probably be horrible. seq-cst has to establish a global order of operations that all threads agree on; that's hard unless a store becomes visible to all other threads at the same time.)

But note that in C++, actual simultaneous access is UB unless you do it with atomic<T>. This allows the optimizer to freely use CPU registers for locals, temporaries, and even globals without exposing registers as a language feature. UB allows optimization in general; that's why modern C/C++ implementations are not portable assembly language.

The historical register keyword in C/C++ means a variable can't have its address taken, so even a non-optimizing compiler can keep it in a CPU register, not memory. We're talking about CPU registers, not the computer science RAM Machine "register = addressable memory location". (Like rax..rsp/r8..r15 on x86, or r0..r31 on MIPS). Modern compilers do escape analysis and naturally keep locals in registers normally, unless they have to spill them. Other types of CPU registers are possible, e.g. a register-stack like x87 FP registers. Anyway, the register keyword existed to optimize for this type of machine. But it doesn't rule out running on a machine with no registers, only memory-memory instructions.

C++ is designed to run well on a Von Neumann machine with CPU registers, but the C++ abstract machine (that the standard uses to define the language) doesn't allow execution of data as code, or say anything about registers. Each C++ thread does have its own execution context, though, and that models PRAM threads/cores each having their own program counter and callstack (or whatever an implementation uses for automatic storage, and for figuring out where to return.) In a real machine with CPU registers, they're private to each thread.

All real-world CPUs are Random Access Machines, and have CPU registers separate from addressable / indexable RAM. Even CPUs that can only compute with a single accumulator register typically have at least one pointer or index register that at least allows some limited array indexing. At least all CPUs that work well as C compiler targets.

Without registers, every machine instruction encoding would need absolute memory addresses for all operands. (Maybe like a 6502 where the "zero page", the low 256 bytes of memory, was special, and there are addressing modes that use a word from the zero page as the index or pointer, to allow 16-bit pointers without any 16-bit architectural registers. Or something like that.) See Why do C to Z80 compilers produce poor code? on RetroComputing.SE for some interesting stuff about real-world 8-bit CPUs where a fully compliant C implementation (supporting recursion and reentrancy) is quite expensive to implement. A lot of of the slowness is that 6502 / Z80 systems were too small to host an optimizing compiler. But even a hypothetical modern optimizing cross-compiler (like a gcc or LLVM back-end) would have a hard time with some things. See also a recent answer on What is an unused memory address? for a nice explanation of 6502's zero-page indexed addressing mode: 16-bit pointer from an absolute 8-bit address in memory + 8-bit register.

A machine without indirect addressing at all couldn't easily support array indexing, linked lists, and definitely not pointer variables as first-class objects. (Not efficiently anyway)

What's efficient on real machines -> what idioms are natural

Most of C's early history was on PDP-11, which is a normal mem + register machine where any register can work as a pointer. Automatic storage maps to registers, or to space on the callstack when they need to be spilled. Memory is a flat array of bytes (or chunks of char), no segmentation.

Array indexing is just defined in terms of pointer arithmetic instead of being its own thing perhaps because PDP-11 could do that efficiently: any register can hold an address and be dereferenced. (vs. some machines with only a couple special registers of pointer width, and the rest narrower. That was common on an 8-bit machine, but early 16-bit machines like PDP-11 had little enough RAM that one 16-bit register was enough for an address).

See Dennis Ritchie's article The Development of the C Language for more history; C grew out of B on PDP-7 Unix. (The first Unix was written in PDP-7 asm). I don't know much about PDP-7, but apparently BCPL and B also use pointers that are just integers, and arrays are based on pointer-arithmetic.

PDP-7 is an 18-bit word-addressable ISA. That's probably why B has no char type. But its registers are wide enough to hold pointers so it does naturally support B and C's pointer model (that pointers aren't really special, you can copy them around and deref them, and you can take the address of anything). So flat memory model, no "special" area of memory like you find on segmented machines or some 8-bit micros with a zero page.

Things like C99 VLAs (and unlimited size local variables) and unlimited reentrancy and recursion imply a callstack or other allocation mechanism for function local-variable context (aka stack frames on a normal machine that uses a stack pointer.)

Caveat: I'm interested in theoretical CS stuff, but I don't actually work in that field or pay super careful attention to a lot of this stuff. I may have distorted some things. And almost certainly could have edited this down to fewer words if I spent even more time on it. I think some of the key points are well-formatted and presented in this version of the answer, especially the section at the top, and at the bottom about PDP-7 / PDP-11 supporting pointers vs. 8-bit micros that don't nearly as easily. — Peter Cordes, Oct 10 '19 at 09:35

Jerry Coffin · Answer 2 · 2019-10-09T22:52:20.707

I think trying to pin C++ (or most other languages) to a single architecture model is difficult at best. Let's consider C++ 98/03. As the question says, they fit with the Von Neumann model. Oh, but wait--they also fit about equally well (if not better) with Harvard architecture.

For that matter, Harvard Architecture is really more a family of models than a single model. In particular, a CPU is usually seen as using a Harvard Architecture if it has separate caches for code and data--even if it's something like an x86, where the hardware does its best to hide that split from the code (e.g., you can write self-modifying code, and after you've modified the code, what you execute will be the new code--though there can be a substantial penalty, because the instruction cache isn't optimized to deal with modifications).

But "Harvard Architecture" can also be used to describe things like some DSPs, which have two (or three) entirely separate memory buses connected to physically separate memory:

The language rules to accommodate this are actually fairly subtle--to the point that unless you were looking for them, it would be easy to miss them entirely. For example, C and C++ define a pointer to a function as a separate thing from a pointer to data. They're also pretty careful to avoid giving any guarantees about things like addresses being comparable except under fairly limited circumstances (e.g., in C++ you're not guaranteed anything about comparing the address of a function to the address of data).

Since the C++11 standard, however, that's changed a bit. While the core language retains the basic character of having some stream of instructions that are executed in a specified order, the library adds the ability to create multiple threads that can execute in parallel. These are allowed to communicate via shared memory, but you have to use an atomic variable or a memory fence to guarantee any degree of success. That allows implementation on machines anywhere from extremely tightly coupled, to fairly loosely coupled, where (for example) communication that looks like shared memory may actually involve sending data over something like a network connection, with a signal sent to tell the far end when a transmission is complete.

So, again, the specification of the language isn't really tied to what would normally be seen as a single architecture at the hardware level. Rather the contrary, while it probably works better for what would normally be thought of as fairly tightly coupled machines, I believe it could be realized on fairly loosely coupled machines such as a cluster of entirely separate, disparate machines. You'd typically need (or at least want) to change how you wrote your code, but at least in theory you could write portable C++ code that ran on either.

*a CPU is usually seen as using a Harvard Architecture if it has separate caches for code and data* That sloppy terminology (instead of Modified Harvard) is usually only used when talking about bandwidth / performance, not computability. I refuse to call split L1 caches on top of a unified address-space and single bus a Harvard machine, and so should everyone else! In this context Harvard is (as you say) about having split address-spaces or at least split buses, allowing program in flash and data in RAM, for example. — Peter Cordes, Oct 10 '19 at 08:14
C++ on hardware where you have to fake coherence with software is maybe theoretically possible but not plausible for practical performance reasons. Remember that a release-sequence has to make *all* preceding atomic and non-atomic operations visible to other threads that might synchronize-with it via an acquire load. i.e. it has to do a full sync. Also, unless you flush after every relaxed store, you at least risk violating the note that says stores should become visible to other threads promptly. (Like on normal coherent shared mem that always tries to drain its store buffer ASAP) — Peter Cordes, Oct 10 '19 at 08:18
I'm also not confident you could reliably implement seq-cst over non-coherent SHM with more than 2 nodes. All threads have to agree on a global order of operations for seq_cst loads/stores (across objects). I guess it's probably doable if you're willing to wait for a network RTT after every seq_cst store, but that's hardly a viable implementation. C++ very much assumes that all threads will share *coherent* memory. Machines with non-coherent shared memory in real life (some clusters) use it for fast message-passing under software control (e.g. MPI), not for single-system-image / threads. — Peter Cordes, Oct 10 '19 at 08:23
@PeterCordes: Well, I'll admit I haven't implemented it to be sure it'd work out, but it seems like there are some optimizations that could be done. What we're talking about is basically similar to distributed database updates, which have been studied for years and fairly efficient ways to avoid most of the difficulties have been found. — Jerry Coffin, Oct 10 '19 at 14:45
@PeterCordes: As far as split caches (and such) being Harvard architecture or not: I mostly agree that it's sloppy terminology that I wish had never come into use--but the usage is now so common that (at best) miscommunication is almost inevitable if I try to treat Harvard Architecture as referring solely to machines with completely separate data and program storage. My real point was that the name is too widely abused to mean much--you need to specify more detail to be sure what you're saying isn't misunderstood. — Jerry Coffin, Oct 10 '19 at 14:52
Yeah that's true about actual usage. In my own answer, I said "split-bus split-address-space Harvard" to make sure it was clear. But anyway, after further thought I'm pretty sure Harvard vs. Von Neumann wasn't the point *at all* of being a Von Neumann Language. Just like [Does the Harvard architecture have the von Neumann bottleneck?](//stackoverflow.com/q/54882556). Harvard is just a performance optimization of a stored-program machine that requires special instructions to modify program memory. Both have addresses and a sequential execution model, and the VN bottleneck. — Peter Cordes, Oct 10 '19 at 16:08

Basile Starynkevitch · Answer 3 · 2019-10-10T04:52:55.407

C++ is a specification written in English in a standard. See n3337 -late draft of C++11.

As Jerry Coffin and Peter Cordes are explaining, the official model is a parallel random machine.

But you generally code in C++ by using a compiler and running your program (unless you code embedded systems) under some operating system (e.g. Windows or Linux; read also this). Many OSes provide dynamic loading facilities (e.g. dlopen(3) on Linux) and most computers could have C++ compilers.

Then you practically could generate C++ code at runtime, fork a compilation of that generated C++ code as a plugin, then dlopen that generated plugin. And on Linux you can do that many times (e.g. have dozen of thousands of such generated plugins, see my bismon and manydl.c programs).

You could also find several JIT-compiling C++ libraries, such as libgccjit or LLVM.

Practically speaking, C++ programs can generate code at runtime then use it (even if that is outside of the C++ standard). And that is characteristic of Von Neumann machines.

After thinking some more, I think the Harvard / Von Neumann distinction isn't the interesting one in this context. It's that programs are stored as instructions that are fetched and executed sequentially, vs. a fundamentally different model of computation like Cellular Automata. i.e. it's an imperative model of computation, lending itself to sequential imperative languages like C or x86 assembly. Updated my answer significantly with some theoretical CS stuff, and fun links like C not being Turing Complete (finite storage). — Peter Cordes, Oct 10 '19 at 09:28

Is C++ considered a Von Neumann programming language?

3 Answers3

What is an abstract machine model / model of computation anyway?

C++ can't be Turing complete: pointers have a finite size.

So Computer Science abstract machines are silly, what about the C++ Abstract machine?

What's efficient on real machines -> what idioms are natural

Linked