5

I recently asked a question on using volatile and was directed to read some very informative articles from Intel and others discussing memory barriers and their uses. After reading these articles I have become quite paranoid though.

I have a 64-bit machine. Is it safe to memcpy into adjacent, non-overlapping regions of memory from multiple threads? For example, say I have a buffer:

char buff[10];

Is it always safe for one thread to memcpy into the first 5 bytes while a second thread copies into the last 5 bytes?

My gut reaction (and some simple tests) indicate that this is completely safe, but I have been unable to find documentation anywhere that can completely convince me.

Community
  • 1
  • 1
JaredC
  • 5,150
  • 1
  • 20
  • 45

4 Answers4

7

Safe, yes. Performant, no- in this limited example, at least. Remember that one cache line cannot be in two cores at once. You will force core A to wait while core B writes to the buffer, and then wait while the memory is transferred, and then write to it. Multi-core memory copies should be very large in size to avoid this effect.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • This depends on the processor, if the memory was declared volatile, and the optimization level of the compiler. Generally, each core will write to its own copy of the cache, which the processor will resolve later during the subsequent flush-out. That does bring up that an actual memory synchronization barrier is thoroughly necessary to get deterministic behavior out of actually trying to read the memory that's been written. – Adam Norberg Jan 14 '11 at 21:37
  • 2
    x86/x64 and many other processors are cache coherent (http://en.wikipedia.org/wiki/Cache_coherency). Two cores can hold the same cache line, and both copies will always be kept up-to-date so that they hold the same value. Of course, like said, accessing the same cache line from two cores has a huge impact on performance. – Timo Jan 14 '11 at 22:16
  • @Adam: Luckily I am only worried about writing-- any reads will happen far in the future. Otherwise, I'd probably be even _more_ paranoid. – JaredC Jan 14 '11 at 22:40
  • I was watching a video about Microsoft's latest concurrency library, and they said that cache line sharing was the biggest problem they had in making their apparently good code actually work. It would change code from 40% scaling on to 100% scaling on 24 cores when fixed. False sharing is not a minor performance niggle, it will present a serious problem. – Puppy Jan 14 '11 at 22:46
  • Adam's still right that it depends on the processor, though. Questioner only says "I have a 64 bit machine". I'm 99.9% certain that means x64, but not all 64-bit processors are x64/IA64. There's talk going around of 64-bit ARM, and ARM architectures don't always have coherent caches, in which case those adjacent writes perhaps could be non-thread-safe. Then false sharing becomes a correctness issue rather than just a performance one. – Steve Jessop Jan 14 '11 at 23:46
4

Yes, it completely safe, serialization of access to the memory bus is done in hardware.

Gene Bushuyev
  • 5,512
  • 20
  • 19
2

As long as each instance of memcpy believes it is writing into only its part of the buffer, it is completely safe. Array allocations of any form in C++ are very low-level; it is a contiguous block of storage allocated at the appropriate size for the program, and the array as an object that exists as anything other than a pointer is simply an illusion. Give memcpy non-overlapping ranges of the array, and it has no way of knowing that they aren't simply two completely separate arrays that just happen to be adjacent to each other. The writes won't interfere.

Adam Norberg
  • 3,028
  • 17
  • 22
  • MemCpy? More like memcpy. What's more the OP is concerned about issues of simultaneous access to the memory from different threads. – David Heffernan Jan 14 '11 at 19:54
  • Your tactful and polite copy editing is appreciated in the cooperative spirit in which it was offered. Anyway, the OP is concerned about issues of simultaneous access to non-overlapping adjacent regions of memory in different threads, which is what I answered. – Adam Norberg Jan 14 '11 at 21:35
0

Yes this is totally unrelated to any kind of happens-before ordering. its just copying bytes around.

time4tea
  • 2,169
  • 3
  • 16
  • 21