0

I want to allocate space for a large array that will be write-only until the very end of the program. For that reason, I don't care if it's it cached.

I also want to access this very frequently, so I don't want to have to do a page walk more than once. For that reason I want it to be allocated in a large a page (e.g. 4M).

So how can I...

  • ...request the memory to be either uncacheable or write-through?
  • ...request the memory to be placed in a large page?

I am working in Linux.

Nathan Fellman
  • 122,701
  • 101
  • 260
  • 319
  • Are you sure you want it to be uncacheable? Maybe a non-temporal store is what you want: http://stackoverflow.com/questions/37070/what-is-the-meaning-of-non-temporal-memory-accesses-in-x86 – Adrian Cox Apr 18 '11 at 08:18
  • @Adrian: non-temporal is also good. How can I tell the compiler to generate non-temporal stores? – Nathan Fellman Apr 18 '11 at 10:24
  • I haven't done this for a while, but there are some starting links here - http://stackoverflow.com/questions/661338/sse-sse2-and-sse3-for-gnu-c – Adrian Cox Apr 18 '11 at 11:00
  • Keep in mind that "write only" doesn't really imply that caching isn't useful. With the usual "write-back" memory type, writes imply a read: before writing, the CPU brings the corresponding line into the L1 cache, and then the write occurs in the L1. So if you do many writes to the same cache line, you _really_ want the line in the L1, since it allows the writes to complete very quickly. If you don't cache the writes (e.g., by using an NT store or WB memory) each write (or, at beast, all writes for a cache line) has to go all the way to memory, which make take 100s of cycles. – BeeOnRope Jul 10 '17 at 18:54
  • ... so you really want that behavior only for memory that you only write once, more or less (or least write infrequently enough that the caching behavior isn't relevant). You also want to write consecutive locations in a cache line all at once, to at least use the write-combining abilities so that you send one cache line down to memory at a time, rather than a few bytes at a time. – BeeOnRope Jul 10 '17 at 18:56

2 Answers2

1

Disabling caching sounds like it would make your writes slower if it forces a write all the way through to the RAM. I'm not sure I'd attempt that at all.

To actually use large pages, I suggest following HugeTLB - Large Page Support in the Linux Kernel. It contains an example of how you can use large pages via a shared memory segment.

Mat
  • 202,337
  • 40
  • 393
  • 406
0

With transparent hugepages, simply allocating a 4M-aligned buffer will work. Use aligned_alloc or posix_memalign to get a pointer you can free. (Note that aligned_alloc is required to fail if the buffer size isn't a multiple of the alignment. /facepalm).

Depending on your setting for /sys/kernel/mm/transparent_hugepage/defrag, you may need to use madvise(MADV_HUGEPAGE) on the buffer to strongly encourage the kernel to use hugepages.

Also note that x86-64 uses 2M hugepages. x86-32 uses 4M hugepages. Aligning to 4M is fine if you want the easy solution for both.


request the memory to be either uncacheable or write-through?

AFAIK, you can't easily do that through normal Linux APIs. NT stores work to normal write-back memory, so use that instead. (They over-ride the memory type and are weakly-ordered cache-bypassing).

But if you're not writing full cache-lines at a time, you definitely want cached writes. Especially if there's any spatial or temporal locality, but even if not then letting the store buffer do its job (hiding the latency of cache-miss stores) is a good thing.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847