1

Let's say I have an algorithm which would optimize subroutine ordering, in hopes to reduce TLB misses.

How would one be able actually to reorder subroutines at run-time in gcc compiler? So far I presume that it would be possible to write a gcc plugin for this. How ever I have no idea how to actually write them let alone that it is possible at all.

llllllllll
  • 16,169
  • 4
  • 31
  • 54
  • Related: [cache miss, a TLB miss and page fault](https://stackoverflow.com/questions/37825859/cache-miss-a-tlb-miss-and-page-fault) – Thomas Matthews May 23 '18 at 19:01
  • Are you getting TLB misses due to data fetching (memory accessing) or due to instruction fetching? – Thomas Matthews May 23 '18 at 19:03
  • There are a lot of "it depends" items. It depends on the memory on your machine, whether the OS has room for all of your code, how often a function is called, and how sparse the function calls are. There are not many TLB misses if a function is loaded and accessed frequently. However, you may get more TLB misses if your code is calling many different functions that are not in memory. The order is not much of an issue as the length of the function and whether it needs to be replaced. – Thomas Matthews May 23 '18 at 19:17
  • The likely answer would be to place the functions in named sequential sub-sections ((.text.prefix.1 ... .text.prefix.N) and then count on the linker placing those sub-sections in contiguous memory. – SoronelHaetir May 23 '18 at 19:19
  • You may get some benefit by ordering the functions by frequency of being called. Functions called more frequently should be placed closer to the caller than less frequently called functions. – Thomas Matthews May 23 '18 at 19:20
  • @ThomasMatthews Yes I'm planing to do just that, functions who are more likely to be called to place them closer. The question is with what tools I actually can do this. – Modestas Jurčius May 23 '18 at 19:59
  • This can certainly be done relatively easily, but at compile-time. Doing it at run-time is also technically possible, but is much more complicated and requires a lot of effort from your side. It's unusual to reorder functions at run-time, are you sure this is what you want? – Hadi Brais May 24 '18 at 02:57
  • @HadiBrais I would be pleased if you could give me both scenarios on how I would have to do it, compile-time and run-time. – Modestas Jurčius May 24 '18 at 07:32
  • You should be able to put similar functions together with your compiler's inter-procedural profiled guide optimizations. The compiler should be able then to put hot (or temporal nearest) functions together, and cold ones far away. You need to compile your software once, run it under the PGO, and then compile it again with the generated feedback. In gcc, that option is ` --profile-generate` and ` --profile-use`. Take care that if the tests suite you use to generate profile information is too different from the actual software use, it will could perform worse than default! – eugenioperez May 24 '18 at 15:15
  • @eugenioperez How does putting related functions together actually help to decrease TLB misses? I can see that improving overall need of traversing memory when reaching the called function, but I would imaging TLB be independent of order in the actual program. Maybe you have a specific optimzation flag in mind? – Deandre Thomson May 24 '18 at 16:14

0 Answers0