Is there a reasonable way for a Linux userspace program to enable/disable cache write combining for a memory page that it owns?
The two target systems I care about: Intel Haswell processor on a 3.0 kernel, and Intel Skylake processor on a 4.8 kernel.
I'm tuning a mature, multi-threaded application that uses large buffers to transfer data between a producer and a consumer. Based on profiling, I have reason to believe that the application would benefit from the buffers' pages sometimes using write-combining caching, rather than write-back caching.
I considered instead using non-temporal writes to populate the buffer, but it would require a larger code refactoring than is possible for my current effort.
This question, this question, and this LWN article discuss the issue, but from the perspective of a device driver. In my case, I'm working with userspace code, running as non-root.
This 2008 paper discusses the different API's for controlling a page's caching mode. It seems to indicate that a userspace application can obtain write-combining access to a page using mmap
(see sections 5.3, 5.4 and 5.6), but the documentation isn't clear (to me, at least) regarding exactly how to use those mechanisms.