Is it possible to get LLVM-IR from Assembly file?

Question

I compiled .S file using command:

clang-8 -c funcs.s -o funcs.o -emit-llvm

I found, that .o file was generated in ELF format. I was expected to see llvm-ir format (with "BC" characters at the beginning of resulting file).

Seems, Clang ignores "-emit-llvm" flag.

Reopened this one (forgot about the dupe hammer). Possible duplicate of https://stackoverflow.com/questions/8288251/is-it-possible-to-translate-an-assembly-language-to-llvm-ir-optimize-it-and-the — Michael, May 15 '19 at 09:29
@Michael My question is more narrow, IMHO. Moreover, possible duplicates were asked 7-8 years ago. Probably there were a lot of changes in clang/llvm during this period of time. P.S. Thank you, anyway. It is useful information in context of this question. — lol lol, May 15 '19 at 09:37
The LLVM toolchain has never and probably will never translate assembly source files into LLVM bitcode. They are always just passed through as-is to the assembler (generating object code). So there is no LLVM bitcode the compiler can give you. — fuz, May 15 '19 at 12:27
@fuz Thank you for your opinion! I think, llvm team must have such a possibility. Why? For example, to have full support of CFI. For gcc compiler I've seen that it's assembler contains statements like ".cfi_startproc", ".cfi_endproc", etc. I haven't found some alternative for clang/llvm. — lol lol, May 15 '19 at 16:02
@lollol The x86 code generator actually generates appropriate CFI directives from LLVM bitcode automatically. CFI directives do not make any sense in LLVM bitcode as they describe the layout of the stack frame which is something that doesn't quite exist before LLVM bitcode is turned into assembly. I'm not sure why you think that LLVM needs to have this feature. At no point is assembly code ever turned into LLVM bitcode and there is no reason why it ever should. And as far as I know, the x86 assembler shipped with LLVM does recognise CFI directives. — fuz, May 15 '19 at 23:49
@fuz It seems to me, that you've got me wrong. I was not talking about cfi directives for LLVM bitcode. I was talking about support of such directives by CLANG, when in processes source code written in Assembly. Why it is needed? For example, to make sanitizer to operate in a correct way when cross-language procedure calls performed. For example for indirect calls, when Assembly procedure (which contains 2 arguments) is called from C procedure (which provides only 1 argument to the calling procedure). To make correct analysis (sanitizers, lto, etc.) Clang requires complete information. — lol lol, May 16 '19 at 07:23
@fuz How is possible to give such an information when we're working with Assembly files? Use of CFI directived (here are bounds of procedure, here are registers for arguments, etc.). — lol lol, May 16 '19 at 07:26
@lollol The CFI directives are needed to unwind functions when asynchronous exceptions are generated. They are not needed for the sanitiser as far as I'm concerned. IRefer to the amd64 SysV ABI document for details. If you write assembler code, you need to add appropriate CFI directives to your code yourself. LLVM bitcode doesn't enter the picture here and quite frankly, I have no idea why you claim “I think, llvm team must have such a possibility. Why? For example, to have full support of CFI.” — fuz, May 16 '19 at 09:31
@fuz Thanks for information about CFI directives appliance. I will check it. Yes, I agree, that cfi directives must be added manually if we're writing in assembly. And one more "yes" to your "Why? For example, to have full support of CFI.". It is not a bad idea to have such a CFI support. — lol lol, May 16 '19 at 10:12
@lollol I'm not sure what sort of support you are missing. CFI directives are completely supported by the assembler shipped with LLVM. And note that you asked “Why? For example, to have full support of CFI.” How is having an assembly to bitcode decompiler needed to have full suport of CFI? That doesn't make any sense at all. — fuz, May 16 '19 at 10:50
@fuz You're saying about CFI directives in context of "CFI directives are needed to unwind functions when asynchronous exceptions are generated". At the same time, I'm telling you about CFI in view of "Control Flow Integrity" feature. That is a point of our misundarstanding, I gues. — lol lol, May 16 '19 at 14:22
@lollol CFI stands for “call frame information,” these are directives like `.cfi_startproc`. It would be quite funny if we were talking about two entirely different things the whole time. Anyway, I do understand your point of view and I must disappoint you: LLVM cannot edit your assembly code to add call frame integrity code. — fuz, May 16 '19 at 16:04
@fuz I'm considering the case, when cfi directives manually added to the assembly code. — lol lol, May 17 '19 at 07:43

score 4 · Answer 1 · answered May 15 '19 at 10:15

4

Your question isn't fundamentally different from Is it possible to translate an assembly language to LLVM IR, optimize it and then recompile it to a different architecture?.

asm source and binary executables / object files are basically equivalent for this problem. You're still trying to decompile to LLVM-IR. This is hard, and I don't know if a decompiler exists.

Seems, Clang ignores "-emit-llvm" flag.

No, it just didn't affect any of the steps involved in the operation you asked it to do.

You asked your compiler to compile to a .o, so it did so.

If the input had been .c and the output a .s, it would have been able to emit LLVM-IR, but in this case LLVM-IR wasn't part of the process of assembling a .s to a .o.

So no LLVM-IR representation of the program ever existed while clang was running, so there was nothing to emit.

answered May 15 '19 at 10:15

Peter Cordes

328,167
45
605
847

Thank you for the answer. I've expected at list to see warning, for example, "-emit-llvm flag was ignored". Yes, I asked .o file, but it just an extension. Frankly speaking I don't agree with this answer, at list not with all of your statements. This is not de-compiler, but a translator from Assembly to some other language (LLVM-IR). – lol lol May 15 '19 at 15:54
@lollol: translating from asm to compiler input = decompiling. – Peter Cordes May 15 '19 at 16:03
Right. My fault! – lol lol May 15 '19 at 16:05
But wait. What about https://en.wikipedia.org/wiki/Decompiler ? For me decompiler is smth like disassembler... – lol lol May 15 '19 at 16:05
@lollol: LLVM-IR is a *slightly* higher level language than x86 machine code or assembly. It separates the semantic meaning of the program's logic from the specific implementation choices made to implement that logic. With just asm, you don't know whether a change to a register value was a true output from a block of code or just a dead temporary. (Programs can contain inline asm or be hand-written, so you don't get to assume that the asm was compiler-generated and follows certain patterns.) Therefore that Wiki article is in agreement with what I'm saying. – Peter Cordes May 15 '19 at 16:10
I agree, that Assembly is slightly of lower level, than C. From the other side, I cannot agree, that Wiki page, mentioned, is in agreement of what you were saying :) "A decompiler is a computer program that takes an executable file as input, and attempts to create a high level source file which can be recompiled successfully.". Assembly file is not executable. Generally Assembly is of some sort of higher programming level if to compare with machine instructions. Anyway, let skip this useless discussion :) – lol lol May 15 '19 at 16:21
@lollol oh come on, asm and machine code are essentially equivalent. There's pretty close to a 1:1 correspondence between a machine-code instruction and an asm source line. A decompiler is *much* harder to write than a disassembler. A disassembler is mostly just some lookup tables + rules. A decompiler has to figure out function boundaries, especially if it's going to work on stripped executables (no symbol table) – Peter Cordes May 15 '19 at 16:28
1

@lollol The essential difference between a decompiler and a disassembler is that a disassembler simply turns one representation of machine code into another. A decompiler on the other must run complex analysis on the machine code to guess what sort of meaning it could represent to guess a high-level function that could theoretically compile to that machine code. This is an incredibly complicated task to do compared to disassembling an object file. – fuz May 15 '19 at 23:52
@Peter I'm not saying that decompilation is an easy thing. My question is about support of such a thing by Clang. And It doesn't matter how to name process of transforation of Asm into LLVM-IR. Frankly, I don't care :) I've got you opinion and thankfull to you for it, but I'm waiting for other opinions. Maybe other guys, close to LLVM development will tell us more. – lol lol May 16 '19 at 07:31
@fuz Probably, previous comment also may be used here :) – lol lol May 16 '19 at 07:32
@lollol: my point (and fuz's point) is that decompilation isn't an "easy extra" you might expect to find in a compiler + assembler toolchain like clang / LLVM. There is a disassembler, llvm-objdump. There might possibly be a decompiler bundled with LLVM, but I wouldn't *expect* to find one because it isn't needed as part of clang/LLVM or easily created based on any other parts of LLVM the way a disassembler is. Anyway, my answer is just explaining why `-emit-llvm` doesn't do anything here; I'm not claiming there's no other way to accomplish what you want. – Peter Cordes May 16 '19 at 09:15
Still, there could be a transformation that would turn an assembly file to an IR containing inline assembly. – arrowd Dec 15 '19 at 08:30

Is it possible to get LLVM-IR from Assembly file?

1 Answers1

Linked

Related