This is related to Python: writing to memory in a single operation which covered writing a single value to memory.
I'm revisiting this topic with a goal to efficiently write contiguous blocks of memory via Python 3.8's memoryview
. In particular, writing 32-bit values direct to RAM via /dev/mem
, on an ARM64 Cortex A57 CPU.
Don't run this code! It may crash your computer!
OFFSET
is special, and the qemu_ram_write:
output is explained below. For now it is sufficient to know that it is reporting the individual data write operations on the system bus.
>>> import os, mmap
>>> OFFSET=0x0a000000
>>> fd = os.open("/dev/mem", os.O_RDWR) # DANGER
>>> mm = mmap.mmap(fd, 4096, offset=OFFSET)
>>> mv32 = memoryview(mm).cast("@I")
# This does a single memory write (as per the linked thread):
>>> mv32[0] = 1
qemu_ram_write: addr 0x0, data 0x1, size 0x4
# However this does two writes:
>>> data = memoryview(bytearray([1, 0, 0, 0])).cast("@I")
>>> mv32[0:1] = data
qemu_ram_write: addr 0x0, data 0x1, size 0x4
qemu_ram_write: addr 0x0, data 0x1, size 0x4
The same data is written twice!
A similar thing happens with four 32-bit values:
>>> data = memoryview(bytearray([x for x in range(4 * 4)])).cast("@I")
>>> mv32[0:0 + len(data)] = data
qemu_ram_write: addr 0x0, data 0x3020100, size 0x4
qemu_ram_write: addr 0x4, data 0x7060504, size 0x4
qemu_ram_write: addr 0x8, data 0xb0a0908, size 0x4
qemu_ram_write: addr 0xc, data 0xf0e0d0c, size 0x4
qemu_ram_write: addr 0x0, data 0x3020100, size 0x4
qemu_ram_write: addr 0x4, data 0x7060504, size 0x4
qemu_ram_write: addr 0x8, data 0xb0a0908, size 0x4
qemu_ram_write: addr 0xc, data 0xf0e0d0c, size 0x4
Notice how the four 32-bit words are written correctly, and then written again!
But this does not happen with eight 32-bit words:
>>> data = memoryview(bytearray([x for x in range(4 * 8)])).cast("@I")
>>> mv32[0:0 + len(data)] = data
qemu_ram_write: addr 0x0, data 0x3020100, size 0x4
qemu_ram_write: addr 0x4, data 0x7060504, size 0x4
qemu_ram_write: addr 0x8, data 0xb0a0908, size 0x4
qemu_ram_write: addr 0xc, data 0xf0e0d0c, size 0x4
qemu_ram_write: addr 0x10, data 0x13121110, size 0x4
qemu_ram_write: addr 0x14, data 0x17161514, size 0x4
qemu_ram_write: addr 0x18, data 0x1b1a1918, size 0x4
qemu_ram_write: addr 0x1c, data 0x1f1e1d1c, size 0x4
Notice how the words are written correctly, and only once. This is as expected. 16 32-bit words also behaves as expected.
32 words is interesting in its own way:
>>> data = memoryview(bytearray([x for x in range(4 * 32)])).cast("@I")
>>> mv32[0:0 + len(data)] = data
qemu_ram_write: addr 0x40, data 0x43424140, size 0x4
qemu_ram_write: addr 0x44, data 0x47464544, size 0x4
qemu_ram_write: addr 0x48, data 0x4b4a4948, size 0x4
qemu_ram_write: addr 0x4c, data 0x4f4e4d4c, size 0x4
qemu_ram_write: addr 0x50, data 0x53525150, size 0x4
qemu_ram_write: addr 0x54, data 0x57565554, size 0x4
qemu_ram_write: addr 0x58, data 0x5b5a5958, size 0x4
qemu_ram_write: addr 0x5c, data 0x5f5e5d5c, size 0x4
qemu_ram_write: addr 0x0, data 0x3020100, size 0x4
qemu_ram_write: addr 0x4, data 0x7060504, size 0x4
qemu_ram_write: addr 0x8, data 0xb0a0908, size 0x4
qemu_ram_write: addr 0xc, data 0xf0e0d0c, size 0x4
qemu_ram_write: addr 0x10, data 0x13121110, size 0x4
qemu_ram_write: addr 0x14, data 0x17161514, size 0x4
qemu_ram_write: addr 0x18, data 0x1b1a1918, size 0x4
qemu_ram_write: addr 0x1c, data 0x1f1e1d1c, size 0x4
qemu_ram_write: addr 0x20, data 0x23222120, size 0x4
qemu_ram_write: addr 0x24, data 0x27262524, size 0x4
qemu_ram_write: addr 0x28, data 0x2b2a2928, size 0x4
qemu_ram_write: addr 0x2c, data 0x2f2e2d2c, size 0x4
qemu_ram_write: addr 0x30, data 0x33323130, size 0x4
qemu_ram_write: addr 0x34, data 0x37363534, size 0x4
qemu_ram_write: addr 0x38, data 0x3b3a3938, size 0x4
qemu_ram_write: addr 0x3c, data 0x3f3e3d3c, size 0x4
qemu_ram_write: addr 0x60, data 0x63626160, size 0x4
qemu_ram_write: addr 0x64, data 0x67666564, size 0x4
qemu_ram_write: addr 0x68, data 0x6b6a6968, size 0x4
qemu_ram_write: addr 0x6c, data 0x6f6e6d6c, size 0x4
qemu_ram_write: addr 0x70, data 0x73727170, size 0x4
qemu_ram_write: addr 0x74, data 0x77767574, size 0x4
qemu_ram_write: addr 0x78, data 0x7b7a7978, size 0x4
qemu_ram_write: addr 0x7c, data 0x7f7e7d7c, size 0x4
The correct number of writes is performed, but the order is all over the place!
And 64 words is weird too:
>>> data = memoryview(bytearray([x for x in range(4 * 64)])).cast("@I")
>>> mv32[0:0 + len(data)] = data
qemu_ram_write: addr 0x0, data 0x3020100, size 0x4
qemu_ram_write: addr 0x4, data 0x7060504, size 0x4
qemu_ram_write: addr 0x8, data 0xb0a0908, size 0x4
qemu_ram_write: addr 0xc, data 0xf0e0d0c, size 0x4
qemu_ram_write: addr 0x10, data 0x13121110, size 0x4
qemu_ram_write: addr 0x14, data 0x17161514, size 0x4
qemu_ram_write: addr 0x18, data 0x1b1a1918, size 0x4
qemu_ram_write: addr 0x1c, data 0x1f1e1d1c, size 0x4
qemu_ram_write: addr 0x20, data 0x23222120, size 0x4
qemu_ram_write: addr 0x24, data 0x27262524, size 0x4
qemu_ram_write: addr 0x28, data 0x2b2a2928, size 0x4
qemu_ram_write: addr 0x2c, data 0x2f2e2d2c, size 0x4
qemu_ram_write: addr 0x30, data 0x33323130, size 0x4
qemu_ram_write: addr 0x34, data 0x37363534, size 0x4
qemu_ram_write: addr 0x38, data 0x3b3a3938, size 0x4
qemu_ram_write: addr 0x3c, data 0x3f3e3d3c, size 0x4
qemu_ram_write: addr 0x40, data 0x43424140, size 0x4
qemu_ram_write: addr 0x44, data 0x47464544, size 0x4
qemu_ram_write: addr 0x48, data 0x4b4a4948, size 0x4
qemu_ram_write: addr 0x4c, data 0x4f4e4d4c, size 0x4
qemu_ram_write: addr 0x50, data 0x53525150, size 0x4
qemu_ram_write: addr 0x54, data 0x57565554, size 0x4
qemu_ram_write: addr 0x58, data 0x5b5a5958, size 0x4
qemu_ram_write: addr 0x5c, data 0x5f5e5d5c, size 0x4
qemu_ram_write: addr 0x60, data 0x63626160, size 0x4
qemu_ram_write: addr 0x64, data 0x67666564, size 0x4
qemu_ram_write: addr 0x68, data 0x6b6a6968, size 0x4
qemu_ram_write: addr 0x6c, data 0x6f6e6d6c, size 0x4
qemu_ram_write: addr 0x70, data 0x73727170, size 0x4
qemu_ram_write: addr 0x74, data 0x77767574, size 0x4
qemu_ram_write: addr 0x78, data 0x7b7a7978, size 0x4
qemu_ram_write: addr 0x7c, data 0x7f7e7d7c, size 0x4
qemu_ram_write: addr 0x80, data 0x83828180, size 0x4
qemu_ram_write: addr 0x84, data 0x87868584, size 0x4
qemu_ram_write: addr 0x88, data 0x8b8a8988, size 0x4
qemu_ram_write: addr 0x8c, data 0x8f8e8d8c, size 0x4
qemu_ram_write: addr 0x90, data 0x93929190, size 0x4
qemu_ram_write: addr 0x94, data 0x97969594, size 0x4
qemu_ram_write: addr 0x98, data 0x9b9a9998, size 0x4
qemu_ram_write: addr 0x9c, data 0x9f9e9d9c, size 0x4
qemu_ram_write: addr 0xa0, data 0xa3a2a1a0, size 0x4
qemu_ram_write: addr 0xa4, data 0xa7a6a5a4, size 0x4
qemu_ram_write: addr 0xa8, data 0xabaaa9a8, size 0x4
qemu_ram_write: addr 0xac, data 0xafaeadac, size 0x4
qemu_ram_write: addr 0xb0, data 0xb3b2b1b0, size 0x4
qemu_ram_write: addr 0xb4, data 0xb7b6b5b4, size 0x4
qemu_ram_write: addr 0xb8, data 0xbbbab9b8, size 0x4
qemu_ram_write: addr 0xbc, data 0xbfbebdbc, size 0x4
qemu_ram_write: addr 0xc0, data 0xc3c2c1c0, size 0x4
qemu_ram_write: addr 0xc4, data 0xc7c6c5c4, size 0x4
qemu_ram_write: addr 0xc8, data 0xcbcac9c8, size 0x4
qemu_ram_write: addr 0xcc, data 0xcfcecdcc, size 0x4
qemu_ram_write: addr 0xc0, data 0xc3c2c1c0, size 0x4 *
qemu_ram_write: addr 0xc4, data 0xc7c6c5c4, size 0x4 *
qemu_ram_write: addr 0xc8, data 0xcbcac9c8, size 0x4 *
qemu_ram_write: addr 0xcc, data 0xcfcecdcc, size 0x4 *
qemu_ram_write: addr 0xd0, data 0xd3d2d1d0, size 0x4
qemu_ram_write: addr 0xd4, data 0xd7d6d5d4, size 0x4
qemu_ram_write: addr 0xd8, data 0xdbdad9d8, size 0x4
qemu_ram_write: addr 0xdc, data 0xdfdedddc, size 0x4
qemu_ram_write: addr 0xe0, data 0xe3e2e1e0, size 0x4
qemu_ram_write: addr 0xe4, data 0xe7e6e5e4, size 0x4
qemu_ram_write: addr 0xe8, data 0xebeae9e8, size 0x4
qemu_ram_write: addr 0xec, data 0xefeeedec, size 0x4
qemu_ram_write: addr 0xf0, data 0xf3f2f1f0, size 0x4
qemu_ram_write: addr 0xf4, data 0xf7f6f5f4, size 0x4
qemu_ram_write: addr 0xf8, data 0xfbfaf9f8, size 0x4
qemu_ram_write: addr 0xfc, data 0xfffefdfc, size 0x4
If you count them, there are 68 (not 64) 4-byte writes above. The four that are duplicated are marked with a *
.
I've done some more testing and the results are interesting:
Number of 32-bit words to write | Actual number of 32-bit writes | Comment |
---|---|---|
1 | 2 | Doubled |
2 | 4 | Doubled |
4 | 8 | Doubled |
8 | 8 | OK |
16 | 16 | OK |
32 | 32 | Out of order |
64 | 68 | Extra 4 writes |
128 | 132 | Extra 4 writes |
230 | 244 | Extra 14 writes! |
256 | 260 | Extra 4 writes |
I'm running this in a guest in QEMU 7.0.0. I am getting these qemu_ram_write
log entries from a custom QEMU device that I created, as part of the QEMU system host binary, that uses the internal QEMU memory_region_init_io
(MMIO) API to hook up a callback to the .write
operation at "physical RAM" offset 0x0a000000
.
I have also verified this with a real Xilinx FPGA, on a Zynq platform, with an AXI Lite bus, logging the write transactions as they appear on the bus. I see the same unusual behaviour.
If I use busybox devmem 0x0a000000 32 0
or busybox devmem 0x0a000000 64 0
I see just one or two writes, as expected:
root@qemuarm64:~# devmem 0x0a000000 32 0x01020304
qemu_ram_write: addr 0x0, data 0x1020304, size 0x4
root@qemuarm64:~# devmem 0x0a000000 64 0x0102030405060708
qemu_ram_write: addr 0x0, data 0x5060708, size 0x4
qemu_ram_write: addr 0x4, data 0x1020304, size 0x4
Based on this, and the previously linked question, I'm looking very suspiciously at Python's memoryview
, or possibly mmap.mmap
. So what is going on here? Why isn't memoryview
behaving in the manner I would expect it to? What's with the out-of-order and extra writes?
Note: This is observed on ARM64. I haven't tested this with x86-64 yet, which I can only really do with QEMU (no FPGA available). If I do this I will report back.