2

From what I read, there is no atomic read method for unaligned data for C++<11. If I am correct here, then what is the best way to perform such read for more or less current (PIII+) x86 processors? According to my little research LOCK MOV reg, mem is wrong, as lock may work only for instructions with destination operand being memory. So am I condemned to XADD and CMPXCHG?

Number47
  • 493
  • 4
  • 14
  • 1
    Wouldn't it be better to just make sure the data you read is, in fact, aligned? – Igor Tandetnik Sep 20 '13 at 14:16
  • Where did you see that lock may not work for reading? I've never heard that. – 500 - Internal Server Error Sep 20 '13 at 14:22
  • @Igor Tandetnik, no, I would have a big memory lost, if I would. We are talking here about big arrays like 100 000 000 objects made of int + char. – Number47 Sep 20 '13 at 14:29
  • 1
    @500-InternalServerError, "From: Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual: The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD)" may be generated." – Number47 Sep 20 '13 at 14:33
  • @Number47 300M of memory is trivially cheap, unless you're working on some cramped embedded device (in whihc case please mention that in the question). – Mark B Sep 20 '13 at 14:36
  • OT: You may inverse your scheme: instead of having a big array of object(int,char)packed, you may have two arrays, one of int, and the other of char. so all data are aligned, without waste memory. – Jarod42 Sep 20 '13 at 14:46
  • 1
    @MarkB, + something like 1GB in a memory peak for other data. ;) And I want to this be runnable on 2GB machines. Moreover, I am deeply against saying 300M is cheap. If someone wants to use multiple applications and developers of each of them would say something like this, it would give rather hard time to this poor guy. Modern PCs are multitask machines, so I believe it's deeply wrong, to concern only about our application and wast resources which can be still useful for others. – Number47 Sep 20 '13 at 14:54
  • @Jarod42, thank you for suggestion. I know about it, but actually I simplified the problem for not drowning in details. So yes, I've considered other solutions and indeed want to read unaligned data in this case. – Number47 Sep 20 '13 at 15:02
  • 2
    P6+ also guarantees atomicity of unaligned dword reads that do not cross cache line boundaries. But you don't have that guarantee either, and you want to target P3 as well. So, align your stuff, or be forced to use something like `xor eax, eax \ lock xadd [somewhere], eax` – harold Sep 20 '13 at 18:02
  • See [Why is integer assignment on a naturally aligned variable atomic on x86?](https://stackoverflow.com/a/36685056) re: the common subset of atomicity guarantees on AMD and Intel. AMD's guarantee is weaker than Intel's since-P6 guarantee. (P6 is the microarchitecture of Pentium Pro. Pentium II / III are based on it.) – Peter Cordes Sep 22 '22 at 02:28

0 Answers0