1

Until recently, I thought CPU caches (at least, some levels) were 4kb in size, so grouping my allocations into 4kb slabs seemed like a good idea, to improve cache locality.

However, reading Line size of L1 and L2 caches I see that all levels of CPU cache lines are 64 bytes, not 4kb. Does this mean my 4kb slabs are useless, and I should just go back to using regular malloc?

Thanks!

Verdagon
  • 2,456
  • 3
  • 22
  • 36
  • Cache associativity matters here too. Also, many systems want 4K alignment for vector work, so is not a bad thing. The real question is, how small are the allocations you are making? Are you wasting a ton of memory to internal fragmentation? If so, you probably wanted a different system. Each CPU has different wants, so tailor them to your own needs. – Michael Dorgan Apr 07 '18 at 00:10
  • Thanks for the comment! I'm writing a malloc for my programming language, which is more aware of mutable/immutable semantics in the data. So, I have no guarantees on the size of allocations. If I had to guess, I'd say most of them are small, and I'm probably wasting a pretty good amount to fragmentation, but not as much as malloc would (the semantics help with this). What would you recommend? Thanks! – Verdagon Apr 07 '18 at 00:12
  • Many systems I've created for embedded work use 2 allocators. One for 64 bytes or smaller as a unit alloctor of O(1) complexity, and a more generic one for larger allocations. The smaller one is trivial to align to 64-bytes. The larger, I generally don't worry and leave to 4 byte alignment, but the user may specify if they want something in particular - say 4k alignment for a DMA buffer or something. Honestly, profile your data and see what sizes come in. Say track each allocation by base 2 size and count. This will tell you what size to make your allocators. – Michael Dorgan Apr 07 '18 at 00:15
  • I'll do that, thanks! – Verdagon Apr 07 '18 at 00:19

1 Answers1

1

4KB does have a significance: it is a common page size for your operating system, and thus for entries in your TLB.

Memory-mapped files, swap space, kernel I/Os (write to file, or socket) will operate on this page size even if you're using only 64 bytes of the page.

For this reason, it can still be somewhat advantageous to use page-sized pools.

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • Thanks for the answer! Would you say that it's only useful for IO to secondary storage, and not so useful for applications that only really deal with RAM? – Verdagon Apr 07 '18 at 00:20
  • Correct. I would say always try to group into cache lines, and where convenient or where doing I/O, try to group into pages. – Cory Nelson Apr 07 '18 at 00:23
  • 4KB is the most common size, but it is architecture dependent. ... to check, use `sysconf(_SC_PAGE_SIZE)` (needs `unistd.h`) – technosaurus Apr 07 '18 at 05:20
  • Page-size slabs can also help reduce conflict misses if certain fields are hot and others are lukewarm or cold or if there is significant temporal locality for certain field accesses (locally hot); most caches use modulo 2 indexing, slab allocators can better control to which cache sets fields map. Pooled allocation (of which slab is one type) also allows structure splitting with fixed offsets to reduce cache-block internal fragmentation with respect to access timing/frequency (or support SIMD processing). –  Apr 08 '18 at 20:59