GCC doesn't have options to do that.
Instead, compile to asm and do some text manipulation on that output. e.g. gcc -O3 -S foo.c
then run some script on foo.s
to odd-align before some function labels, before compiling to a final executable with gcc -o benchmark foo.s
.
One simple way (that costs between 32 and 95 bytes of padding) is this simplistic way:
.balign 64 # byte-align by 64
.space 32 # emit 32 bytes (of zeros)
starts_half_way_into_a_cache_line:
testfunc1:
Tweaking GCC/clang output after compilation is in general a good way to explore what gcc should have done. All references to other code/data inside and outside the function uses symbol names, nothing depends on relative distances between functions or absolute addresses until after you assemble (and link), so editing the asm source at this point is totally safe. (Another answer proposes copying final machine code around; that's very fragile, see the comments under it.)
An automated text-manipulation script will let you run your experiment on larger amounts of code. It can be as simple as
awk '/^testfunc.*:/ { print ".p2align 6; .skip 32"; print $0 }' foo.s
to do this before every label that matches the pattern ^testfunc.*
. (Assuming no leading underscore name mangling.)
Or even use sed
which has a convenient -i
option to do it "in-place" by renaming the output file over the original, or perl
has something similar. Fortunately, compiler output is pretty formulaic, for a given compiler it should be a pretty easy pattern-matching problem.
Keep in mind that the effects of code-alignment aren't always purely local. Branches in one function can alias (in the branch-predictor) with branches from another function depending on alignment details.
It can be hard to know exactly why a change affects performance, especially if you're talking about early in a function where it shifts branch addresses in the rest of the function by a couple bytes. You're not talking about changes like that, though, just shifting the whole function around. But it will change alignment relative to other functions, so tests that call multiple functions alternating with each other, or if the functions call each other, can be affected.
Other effects of alignment include uop-cache packing on modern x86, as well as fetch block. (Beyond the obvious effect of leaving unused space in an I-cache line).
Ideally you'd only insert 0..63 bytes to reach a desired position relative to a 64-byte boundary. This section is a failed attempt at getting that to work.
.p2align
and .balign
1 support an optional 3rd arg which specifies a maximum amount of padding, so we're close to being about to do it with GAS directives. We can maybe build on that to detect whether we're close to an odd or even boundary by checking whether it inserted any padding or not. (Assuming we're only talking about 2 cases, not the 4 cases of 16-byte relative to 64-byte for example.)
# DOESN'T WORK, and maybe not fixable
1: # local label
.balign 64,,31 # pad with up to 31 bytes to reach 64-byte alignment
2:
.balign 32 # byte-align by 32, maybe to the position we want, maybe not
.ifne 2b - 1b
# there is space between labels 2 and 1 so that balign reached a 64-byte boundary
.space 32
.endif # else it was already an odd boundary
But unfortunately this doesn't work: Error: non-constant expression in ".if" statement
. If the code between the 1:
and 2:
labels has fixed size, like .long 0xdeadbeef
, it will assemble just fine. So apparently GAS won't let you query with a .if
how much padding an alignment directive inserted.
Footnote 1: .align
is either .p2align
(power of 2) or .balign
(byte) depending on which target you're assembling for. Instead of remembering which is which on which target, I'd recommend always using .p2align
or .balign
, not .align
.