2

If a C program changes one byte in a byte array, what machine instructions take place? Does the hardware need read 8 bytes, change a byte, and store it (using 2 memory operations)?

Edit: Specifically on x86-64 architecture

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
curiousgeorge
  • 593
  • 1
  • 5
  • 14
  • 7
    There are many architectures out there. Your best bet is to write a small program doing this and see the resulting assembly. – Eugene Sh. Feb 04 '21 at 16:51
  • 1
    The whole point of C and other high-level languages is that you don't care. The compiler picks the instruction. – MSalters Feb 04 '21 at 16:52
  • Edited to add a specific architecture. – curiousgeorge Feb 04 '21 at 16:53
  • @MSalters Well, if your program is targeted to a specific architecture, it might be beneficial to know the way it works with data to make the program architecture-friendly. – Eugene Sh. Feb 04 '21 at 16:53
  • The hardware has a sophisticated caching system that does memory access with the granularity of 64 byte cache lines. However, this happens in the background and assembly instructions are tailored to each native size (8, 16, 32, 64 bits). Sometimes instructions act directly on memory, other times a piece of memory gets loaded into a register, modified, and written back. – Petr Skocik Feb 04 '21 at 17:00
  • 1
    [Can modern x86 hardware not store a single byte to memory?](https://stackoverflow.com/q/46721075/995714), [Can a bool read/write operation be not atomic on x86? (duplicate)](https://stackoverflow.com/q/14624776/995714) – phuclv Feb 04 '21 at 17:02
  • @phuclv: Pretty sure [Can modern x86 hardware not store a single byte to memory?](https://stackoverflow.com/q/46721075) fully covers it. The question itself shows compiler output using some byte-store instructions, and my answer discusses the fact that current x86 implementations have no internal RMW of a cache word. (Although many other non-x86 microarchitectures do, added that link, too. [Are there any modern CPUs where a cached byte store is actually slower than a word store?](https://stackoverflow.com/q/54217528)) – Peter Cordes Feb 04 '21 at 20:26

2 Answers2

6

On x86-64, the hardware will read one cache line, modify the byte in cache, and eventually that cache line will be written back to memory.

The main reason for the write-back to happen is that the CPU needs the cache line for other data. There are explicit instructions to force the write-back, but a C compiler would be unlikely to use those. It slows down the CPU to force an unnecessary write.

MSalters
  • 173,980
  • 10
  • 155
  • 350
2

It all depends on the compiler, optimisations and etc. Just try to compile and to disassemble. As an example we will compile the following code:

#include <stdio.h>

int main() {
    char a[] = "01234567890";
    a[5] = 'A';
    printf("%s\n", a);
}

// gcc -o main -std=c11 -Wall -Wextra -O0 main.c 

We get the disassemby by objdump:

 6c1:   48 b8 30 31 32 33 34    movabs rax,0x3736353433323130
 6c8:   35 36 37 
 6cb:   48 89 45 ec             mov    QWORD PTR [rbp-0x14],rax
 6cf:   c7 45 f4 38 39 30 00    mov    DWORD PTR [rbp-0xc],0x303938
 6d6:   c6 45 f1 41             mov    BYTE PTR [rbp-0xf],0x41
 6da:   48 8d 45 ec             lea    rax,[rbp-0x14]
 6de:   48 89 c7                mov    rdi,rax

// objdump -d ./main -Mintel | less
TigerTV.ru
  • 1,058
  • 2
  • 16
  • 34