0

I've used mmap() with fopen("/dev/mem") to create a mapping to a block of physical memory shared between two processor cores in an ARM system. When the processor running Linux writes the memory, there can be a lag of over one second before the other non-Linux processor sees the written data. The long delay disappears if the Linux process makes this system call just after writing to memory:

system("sync; echo 3 > /proc/sys/vm/drop_caches" );

I've tried to duplicate that logic directly in code, but the long delay persists:

int fd;
char* data = "3";
sync();
fd = open("/proc/sys/vm/drop_caches", O_WRONLY);
write(fd, data, sizeof(char));
close(fd);

Why does the sync() call differ in behavior from the sync system command? Does the sync command effect virtual memory flushes that the sync() call does not?

I know the manual says that the sync program does nothing but exercise the sync(2) system call, but does the fact that I call sync() from userspace affect its behavior? It acts as though a call to sync from userspace merely schedules the sync rather than blocking until its completion.

alk
  • 69,737
  • 10
  • 105
  • 255
edj
  • 523
  • 7
  • 17
  • 1
    This feels like a long shot, but your `echo` writes 2 bytes (`3` and `\n`) to `drop_caches`, while your other code only writes the `3`. Could that be the difference? –  Dec 26 '13 at 16:35
  • @WumpusQ.Wumbley `sizeof(char) == 1` – wildplasser Dec 26 '13 at 17:46
  • I tried both methods, with an without the trailing newline, but there was no difference. The system echo command as coded in my example does output the (implied) newline. – edj Dec 26 '13 at 17:54

2 Answers2

0

You forgot the newline.

echo 3 outputs "3\n".

Additionally, you are taking an exceptionally circuitous route to implementing shared memory, and imposing massive costs on the rest of the operating system in doing so.

Every time you call sync-the-command or sync-the-system-call, you cause the OS to flush every filesystem on the entire computer; worse, you're telling the OS to forget every filesystem buffer it has, forcing the OS to re-read everything from disk. It's atrocious to performance to the entire operating system in just about every way you can think of.

There is a much, much easier way.

Use shm_open() to create a named shared memory region. Use mmap to access it. Use memory barriers or shared/named mutexes on just that chunk of memory to ensure that you can read and write it consistently and safely.

In terms of complexity, your current approach is probably 1,000,000 times more costly than normal shared memory.

antiduh
  • 11,853
  • 4
  • 43
  • 66
  • I have a separate question pending that addresses specifically how to flush memory mapped with mmap. See this [link](http://stackoverflow.com/questions/20750176/how-to-get-writes-via-an-mmap-mapped-memory-pointer-to-flush-immediately). The current question is more about trying to understand why two supposedly identical sync approaches behave differently. – edj Dec 26 '13 at 18:16
  • The shm_open suggestion doesn't seem like a solution that would work in this case because second processor isn't running Linux. Am I wrong on that? – edj Dec 26 '13 at 18:20
  • Say what? You have two processors that can access the same memory, but are running two different operating systems? What kind of system architecture do you have? Are you talking about a subprocessor like a DSP or something? – antiduh Dec 26 '13 at 18:22
  • This is an Asymmetric Multiprocessing System (AMP). One processor is running Linux for UI and general application development. The other processor is running real-time algorithms as part of a data acquisition system supporting an FPGA. The entire system (two ARM cores, shared memory controller, and an FPGA) are in a Xilinx Zynq chip. – edj Dec 26 '13 at 18:59
  • Ah. That sort of information would've been *very* helpful upfront :). All this talk of "shared memory" is barking up the wrong tree. You just want to cause the linux processor to write to some phsyical address and make sure that the processor completes that write. That doesn't need to involve sync or dropping VM buffers etc. – antiduh Dec 26 '13 at 19:14
  • You want to `open()` /dev/mem, and then `mmap()` on the /dev/mem file descriptor, setting the `offset` field to the address of your physical address that you need to write to. Protection flags should probably be `PROT_READ | PROT_WRITE`. This sort of development is usually the reserve of kernel hackers, and you may find that you will need to write kernel-mode code eventually, if your system interactions become more complicated. See this kinda-related question for more pointers: http://stackoverflow.com/questions/647783/direct-memory-access-in-linux – antiduh Dec 26 '13 at 19:22
  • I already am using open(), /dev/mem, and mmap(). The problem comes about because the pointer returned from mmap() is virtual, which apparently makes it subject to kernel caching. I have other addresses in the physical memory space implemented as read/write registers by the FPGA (i.e., not part of RAM) and those don't show any latencies. Somehow, the kernel has handled the virtual pointer to access RAM differently than the same pointer used to access other addresses in the space. This virtual memory stuff is a mystery to me. – edj Dec 26 '13 at 19:48
  • The pointer returned by mmap is a 'virtual address', which means the actual physical location it corresponds to is only known by the kernel. This is *not* virtual memory; the kernel does not 'cache' reads and writes to memory made by processes. However, memory writes made by the cpu may be held in cache lines *on the cpu's L1/L2/L3 caches* until the *cpu* is flushed, but that happens very frequently. – antiduh Dec 26 '13 at 19:54
  • Virtual memory is a system whereby the OS automatically writes RAM to swap to free up RAM when there's memory pressure. Virtual addresses are just page table mappings for how the CPU translates virtual reads/writes to physical reads/writes, so that the CPU and OS can implement process isolation. – antiduh Dec 26 '13 at 20:00
  • While I won't exclude the possibility that there's a problem with the linux CPU not writing to physical memory in a timely way, have you considered that the other CPU might not be reading from physical memory correctly? For instance, if I have a loop that reads from some pointer, the compiler may not issue any memory instructions in the loop because it has no reason to believe the memory backed by the pointer has changed; the compiler emits instructions that read from the memory into a register, loops comparing the register, and nothing else. Common bug - use volatile or similar. – antiduh Dec 26 '13 at 20:05
  • I mean, consider how the compiler would emit instructions for a normal user program. It's going to try to optimize memory access as much as possible because memory access is 1000s of times more expensive than reading/writing to a register. So just because you have a pointer to some physical address that you're reading from in your C code, doesn't mean that the compiler wrote out instructions that are actually reading from that memory address. Though, given that this sync business on the linux CPU seems to fix it, suggests that the problem is on the linux cpu. – antiduh Dec 26 '13 at 20:07
  • As a debugging technique, perhaps you could test the writes on the linux cpu by creating a second process on the linux cpu that reads from that same physical address. – antiduh Dec 26 '13 at 20:12
  • What is the code you're using to the open call to /dev/mem? Try opening it with O_SYNC. – antiduh Dec 26 '13 at 20:19
  • @antiduh Your definition of "virtual memory" is a quirky one that only Windows users use. Elsewhere it's pretty much synonymous with "virtual addressing". My theory is that back around the Windows 3.x days, people found the thing that sets the size of the swap file under a heading called "Virtual Memory Settings" and decided that "Virtual Memory means Swapping!" - kind of like they decided that "word" means "16 bits" and are still clinging to that on machines with 64-bit word sizes... –  Dec 26 '13 at 20:21
  • I'm guessing that since we're going through a file handle (the FD from open("/dev/mem") to a device driver (the driver responsible for /dev/mem), we're not strictly talking about virtual->physical write problems, and this whole process is subject to file descriptor caching etc. I wish there were a better way to create a virtual->physical mapping for your process without having to write a kernel driver. I would still say that some of this probably belongs in kernel-land. – antiduh Dec 26 '13 at 20:21
  • @WumpusQ.Wumbley - my definition come from my systems programming courses in college. Virtual addresses != virtual memory. Read my explanation here from a previous answer: http://stackoverflow.com/questions/9006634/mapping-of-virtual-address-to-physical-address/9007038#9007038 – antiduh Dec 26 '13 at 20:22
  • @WumpusQ.Wumbley, it would seem that Wikipedia disagrees with me, but is sorta inconsistent. They have separate articles for "virtual address space" and "virtual memory". I don't like conflating the two, but i'll concede that perhaps its common. – antiduh Dec 26 '13 at 20:27
  • To answer one of the questions, the call to open the file is `fd = open("/dev/mem", O_RDWR | O_SYNC);`. – edj Dec 27 '13 at 00:53
  • To address a second question, I'm using volatile pointers to access the memory from the other processor, and the code is sitting in a tight loop looking for new content. What I will do for a sanity check is to have the second processor toggle an output pin the moment it detects the new content. This would prove that I'm not being fooled by caching in the other direction (i.e., writes by the second processor being seen late by Linux). – edj Dec 27 '13 at 00:59
  • To address a third question, I'm not sure that writing a second Linux process to read the memory would do much to clear up the confusion. How could I be sure that a successful read by the second process actually occurred from physical memory? Isn't it possible that the MMU would find the dirty page in cache and do the read from there, with the page not having yet been written to memory? – edj Dec 27 '13 at 01:03
  • Correct me if I am wrong, but my simplified understanding of how a typical MMU works is that the processor looks for page hits/misses with every virtual address access. In the case of a miss, the MMU must get involved to map the page to something, freeing room in the cache if necessary. Once the virtual address is known to be mapped to something directly accessible, the MMU allows the read/write to complete. I think my shared RAM has been marked as cacheable, so the written data will sit in cache until the kernel decides to flush it. Disabling cache on the shared memory would be ideal. – edj Dec 27 '13 at 01:17
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/43961/discussion-between-antiduh-and-edj) – antiduh Dec 27 '13 at 02:33
0

Neither drop_caches nor sync are appropriate here, as both of them deal with file system caches - which aren't actually what you are running into here. The fact that sync appears to solve it is probably coincidental. (It is probably incidentally flushing the data cache when the sync tool is launched.)

Your application is most likely running into cache synchronization issues across the two processor cores on your system. Try using the cacheflush() system call to solve this:

#include <unistd.h>
#include <asm/unistd.h>
...
syscall(__ARM_NR_cacheflush, mapping_ptr, mapping_ptr + mapping_length, 0);

Note that you will probably need to flush the cache in both processes to see the correct results.

Flushing changes to mapped memory is often necessary for other mapped devices, but I think it may not be needed in this case. Trying an msync() as well couldn't hurt, though:

msync(mapping_ptr, mapping_length, MS_SYNC); // on process writing to memory
msync(mapping_ptr, mapping_length, MS_INVALIDATE); // on process reading from memory

Finally, make sure you are mapping this memory with the MAP_SHARED flag.

  • According to the manual, the cacheflush system call is available only on MIPS-based systems. I have an ARM system, without asm/cachectl.h source file, so that code snippet doesn't compile. I also tried msync(), fsync(), and fdatasync(), but none of those affected the latency. For the record, I do use MAP_SHARED on the memory mapping. I also tried adding O_DIRECT to the associated open call, but my kernel throws an invalid parameter error on that flag. – edj Dec 26 '13 at 17:50
  • The manual page is a bit misleading - `cacheflush()` exists on ARM as well, but has a different argument list, and isn't currently wrapped by libc. I've updated the sample code. –  Dec 26 '13 at 19:42
  • For what it's worth, it may also work to mark that hardware memory range as uncacheable. This will take a lot more magic than I'm familiar with, though. –  Dec 26 '13 at 19:44