Align C function to "odd" address

Question

I know from C Function alignment in GCC that i can align functions using

    __attribute__((optimize("align-functions=32")))

Now, what if I want a function to start at an "odd" address, as in, I want it to start at an address of the form 32(2k+1), where k is any integer?

I would like the function to start at address (decimal) 32 or 96 or 160, but not 0 or 64 or 128.

Context: I'm doing a research project on code caches, and I want a function aligned in one level of cache but misaligned in another.

You could create a bunch of duplicate 32-byte aligned functions and check at run time to see if any of them happen to have the "odd" address you want. — Ross Ridge, Jan 09 '20 at 19:07
One idea that could work: turn off function alignment, then before the function, place `asm(".align 64; .space 32")` to manually misalign the function following. — fuz, Jan 09 '20 at 19:29
@fuz No guarantee that the function following in the C source will appear following the inline assembly in the generated assembly. — Ross Ridge, Jan 09 '20 at 19:41
@RossRidge Sure. GHC solves this problem using a Perl script that edits the assembly generated by gcc to insert the required data. — fuz, Jan 09 '20 at 19:43

Peter Cordes · Answer 1 · 2020-01-09T21:16:12.483

GCC doesn't have options to do that.

Instead, compile to asm and do some text manipulation on that output. e.g. gcc -O3 -S foo.c then run some script on foo.s to odd-align before some function labels, before compiling to a final executable with gcc -o benchmark foo.s.

One simple way (that costs between 32 and 95 bytes of padding) is this simplistic way:

 .balign 64        # byte-align by 64
 .space 32         # emit 32 bytes (of zeros)
starts_half_way_into_a_cache_line:
testfunc1:

Tweaking GCC/clang output after compilation is in general a good way to explore what gcc should have done. All references to other code/data inside and outside the function uses symbol names, nothing depends on relative distances between functions or absolute addresses until after you assemble (and link), so editing the asm source at this point is totally safe. (Another answer proposes copying final machine code around; that's very fragile, see the comments under it.)

An automated text-manipulation script will let you run your experiment on larger amounts of code. It can be as simple as
awk '/^testfunc.*:/ { print ".p2align 6; .skip 32"; print $0 }' foo.s
to do this before every label that matches the pattern ^testfunc.*. (Assuming no leading underscore name mangling.)

Or even use sed which has a convenient -i option to do it "in-place" by renaming the output file over the original, or perl has something similar. Fortunately, compiler output is pretty formulaic, for a given compiler it should be a pretty easy pattern-matching problem.

Keep in mind that the effects of code-alignment aren't always purely local. Branches in one function can alias (in the branch-predictor) with branches from another function depending on alignment details.

It can be hard to know exactly why a change affects performance, especially if you're talking about early in a function where it shifts branch addresses in the rest of the function by a couple bytes. You're not talking about changes like that, though, just shifting the whole function around. But it will change alignment relative to other functions, so tests that call multiple functions alternating with each other, or if the functions call each other, can be affected.

Other effects of alignment include uop-cache packing on modern x86, as well as fetch block. (Beyond the obvious effect of leaving unused space in an I-cache line).

Ideally you'd only insert 0..63 bytes to reach a desired position relative to a 64-byte boundary. This section is a failed attempt at getting that to work.

.p2align and .balign¹ support an optional 3rd arg which specifies a maximum amount of padding, so we're close to being about to do it with GAS directives. We can maybe build on that to detect whether we're close to an odd or even boundary by checking whether it inserted any padding or not. (Assuming we're only talking about 2 cases, not the 4 cases of 16-byte relative to 64-byte for example.)

# DOESN'T WORK, and maybe not fixable
1:  # local label
 .balign 64,,31     # pad with up to 31 bytes to reach 64-byte alignment
2:
 .balign  32        # byte-align by 32, maybe to the position we want, maybe not
.ifne 2b - 1b
  # there is space between labels 2 and 1 so that balign reached a 64-byte boundary
  .space  32
.endif       # else it was already an odd boundary

But unfortunately this doesn't work: Error: non-constant expression in ".if" statement. If the code between the 1: and 2: labels has fixed size, like .long 0xdeadbeef, it will assemble just fine. So apparently GAS won't let you query with a .if how much padding an alignment directive inserted.

Footnote 1: .align is either .p2align (power of 2) or .balign (byte) depending on which target you're assembling for. Instead of remembering which is which on which target, I'd recommend always using .p2align or .balign, not .align.

score 2 · Answer 2 · answered Jan 09 '20 at 21:49

As this question is tagged assembly, here are two spots in my (NASM 8086) sources that "anti align" following instructions and data. (Here just with an alignment to even addresses, ie 2-byte alignment.) Both were based on the calculation done by NASM's align macro.

https://hg.ulukai.org/ecm/ldebug/file/683a1d8ccef9/source/debug.asm#l1161

        times 1 - (($ - $$) & 1) nop    ; align in-code parameter
        call entry_to_code_sel, exc_code

https://hg.ulukai.org/ecm/ldebug/file/683a1d8ccef9/source/debug.asm#l7062

                ; $ - $$        = offset into section
                ; % 2           = 1 if odd offset, 0 if even
                ; 2 -           = 1 if odd, 2 if even
                ; % 2           = 1 if odd, 0 if even
        ; resb (2 - (($-$$) % 2)) % 2
                ; $ - $$        = offset into section
                ; % 2           = 1 if odd offset, 0 if even
                ; 1 -           = 0 if odd, 1 if even
        resb 1 - (($-$$) % 2)           ; make line_out aligned
trim_overflow:  resb 1                  ; actually part of line_out to avoid overflow of trimputs loop
line_out:       resb 263
                resb 1                  ; reserved for terminating zero
line_out_end:

Here is a simpler way to achieve anti-alignment:

                align 2
                nop

This is more wasteful though, it may use up 2 bytes if the target anti-alignment already would be satisfied before this sequence. My prior examples will not reserve any more space than necessary.

NASM is a multi-pass assembler, unlike GAS which tries to be 1-pass. I think that's why NASM is able to support `($-$$) & 1` in what NASM calls a "critical expression". And BTW, I tested this and it does work for `-f elf64`, not just flat binary (where even absolute addresses are known at assemble time). This doesn't directly help with GCC output, but +1 anyway for something that's useful with NASM. — Peter Cordes, Jan 09 '20 at 21:54
@Peter Cordes: I don't think that multi-pass processing is needed for the particular case of an expression involving `$ - $$`, as the `$` value is known before assembling the next instructions. However, NASM does have some support for using `equ` to make a label that can be evaluated before whatever it is computed from is known. — ecm, Jan 10 '20 at 17:16
Not necessarily; the length of earlier branch instructions depends on how far away their targets are. (GAS somehow manages to optimize branch lengths of forward branches too, so it can't be as simple as exactly 1 pass.) If they `jmp` forward over this alignment padding, then the distance depends on the padding. But even regular `align` directives can make it hard to get an optimal solution to the branch-length optimization problem, so `$-$$` doesn't create a new problem. [Why is the "start small" algorithm for branch displacement not optimal?](//stackoverflow.com/a/34940237) — Peter Cordes, Jan 10 '20 at 17:21

score 0 · Accepted Answer · 2020-01-09T19:23:03.367

0

I believe GCC only lets you align on powers of 2

If you want to get around this for testing, you could compile your functions using position independent code (-FPIC or -FPIE) and then write a separate loader that manually copies the function into an area that was MMAP'd as read/write. And then you can change the permissions to make it executable. Of course for a proper performance comparison, you would want to make sure the aligned code that you are comparing it against was also compiled with FPIC/FPIE.

I can probably give you some example code if you need it, just let me know.

edited Jan 09 '20 at 19:23

answered Jan 09 '20 at 19:12

3

PIC/PIE doesn't make each function *separately* relocatable; it will use relative offsets to static code and data in the same executable. You actually want `-fno-pie -no-pie` and compile in 32-bit mode so access to static data will use *absolute* addresses which won't break when you move the machine code but not the data... Also maybe `-fno-plt` so dynamic library functions are called via absolute address of their GOT entry? But otherwise non-leaf functions will still break. Or don't do any of that and use my answer :P – Peter Cordes Jan 09 '20 at 19:16
@PeterCordes Interesting, but how will you avoid absolute jumps being generated in the code? – Jan 09 '20 at 19:21
2

Most ISAs (including x86) don't have any absolute direct jumps. Now that you mention it, both those methods would also break if the function contains a `switch` that compiles to a jump table. With `-fPIE`, access to the jump table itself would break (unless you also find that private static data and copy it to the same relative offset from the code). But then yes it might work because GCC uses relative jump-table offsets when making PIE code, even for targets that can fixup absolute addresses. With `-fno-pie`, the jump table itself will hold absolute addresses of the original, not the copy. – Peter Cordes Jan 09 '20 at 19:25
@PeterCordes Of course they do, I can't think of a single ISA that I have ever used that didn't have them on x86, opcode `0xEA` is one. – Jan 09 '20 at 19:35
1

@S E: `0EAh` is a **far** absolute jump though; this is rarely used in 386+ protected or amd64 long mode. – ecm Jan 09 '20 at 19:41
1

@SE: I was excluding x86 far jumps because GCC/clang will never ever emit that instruction, nor will any compiler targeting a mainstream x86 OS in 32 or 64-bit mode. And besides, that opcode doesn't even exist in 64-bit mode; there is no `jmp ptr16:64` far absolute direct, only memory-indirect far jump. https://www.felixcloutier.com/x86/jmp. Some ISAs have both relative and absolute direct jumps, e.g. MIPS has `b`ranch and `j`ump. But I think ARM (http://www.keil.com/support/man/docs/armasm/armasm_dom1361289865686.htm) only has relative direct jumps (and register indirect), like x86-64. – Peter Cordes Jan 09 '20 at 19:51
okay, but it seems like you are making a lot of assumptions about what the compiler will and wont emit. even if you know it won't emit a certain instruction, what is stopping it from emitting a sequence like `MOV eax, address` `JMP eax` I'm not trying to be pedantic, and I think that is what you were referring to with jump tables. I guess I just don't see how tweaking the entire generated assembly by hand would be any easier in your solution – Jan 09 '20 at 20:00
1

@SE: Your answer is the one that depends on all these assumptions about code-gen! I'm just talking about some cases where it might not break. In the compiler's asm source output, all references to other locations are *symbolic*, via label names. e.g. a jump table would use `table: .long target1, target2`. Adding padding before a label is always safe (and doesn't even have to be NOPs when it's outside a function); **assembling+linking will resolve all references to labels into the correct absolute or relative encoding.** GCC / clang don't even keep track of byte position. – Peter Cordes Jan 09 '20 at 20:13
1

*what is stopping it from emitting a sequence like MOV eax, address` ; `JMP eax`* that would be less efficient so compilers don't do that. You could get code-gen like that if you had un-optimized code that used a function pointer, but that would be for a jump outside the function. I guess maybe with GNU C computed goto (taking the address of a C `goto`-target label). I'm pretty sure a compiler-generated jump *within* a function will always use relative encoding except for the case of `switch`. Except of course on MIPS where it could use `j` if you compile with `-fno-pie`. – Peter Cordes Jan 09 '20 at 20:18
1

But anyway, copying just the machine code of a single function somewhere else does not work if it accesses any static data or calls any other functions, *especially* when you compile it with `-fpie`. The only thing that does keep working for access to stuff outside the function is absolute addressing, but the only thing that keeps working *inside* the function is relative. So there's no general-case way for your answer to work. – Peter Cordes Jan 09 '20 at 20:20

Align C function to "odd" address

3 Answers3