You have quite large misunderstandings about how the CPU cache (and really, the CPU itself, and the whole abstraction layer above it) works. .NET can't force anything to be in any CPU cache, that's solely the responsibility of the CPU and noone else. The cache is always a duplicate of RAM, if it is in the cache (and it's still valid), it will also be in RAM. In any case, all of those things are an implementation detail, and you can't rely on it anyway.
All of your questions require quite broad answers. The simple answer is that multi-threaded programming is very hard, and if you don't think so, you really don't have much experience yet :) Once you realize how huge amounts of assumptions and performance optimizations the CPUs do, you'll also realize how C++ isn't really all that much closer to the "real hardware" than C#.
All memory is shared across threads by default - if you pass the reference. This is bad, because it gives rise to synchronization issues. Some are caused by caching (whether it's in the CPU cache or even in the CPU registers), some are caused by the fact that most of the operations you do are not atomic.
Now, of course, if you're doing some isolated, CPU-bound work, you can gain a lot of benefit from being able to fit the whole memory you're working with to the CPU cache. You can only help this by using data structures small enough - you can't force a bit of information to be cached or anything (in fact, every single thing that you read from memory will be in the CPU cache at one point or another - the CPU can't read directly from RAM - RAM is way too slow). If you can fit your whole data inside the cache, and noone causes you to be evicted from the cache (remember, multi-tasking environment), you can get amazing performance even from conventionally expensive operations (e.g. lots of jumps around in memory rather than sequential access etc.).
As soon as you need to share data between threads, though, you're starting to get into trouble. You need synchronization to make sure the two CPUs (or CPU cores, I'm not going to distinguish between those) are actually working on the same data!
Now, in practice, you're going to find out that CPU caches tend to be shared between the cores to an extent. That's good, because sharing the CPU cache is still about an order of magnitude faster than doing the synchronization through RAM. However, you can still run into many issues, such as the very funny case like this pretty typical thread loop:
while (!aborted)
{
...
}
In theory, it is quite possible that this will simply happen to be an infinite loop. An aggressive compiler might see that you're never changing the value of aborted
and simply replace the !aborted
with true
(.NET will not), or it might store the value of aborted
inside a register.
Registers are not synchronized automatically by default at all. This can be quite a problem if the body of the thread loop is simple enough. As you dive deeper into multi-threaded programming, you'll be completely devastated by the code you used to write and the assumptions you used to have.
The most important thing to remember is that all those optimizations the compilers and CPUs do are only guaranteed to not change the behaviour if you're running them isolated and in a single thread. When you break that assumption, all hell breaks loose.