Determine the reg that an LLVM IR instruction assigns to

Question

LLVM Instruction allows you to determine the operator and operands. How can you determine the name of the reg that the instruction is assigning to?

This question: How to tell if LLVM Instruction has a Left-Hand Side asks if there's a way to determine if there is a LHS assignment, and the answer is "almost always". But how do we determine it's name? E.g. how do we differentiate %1 = xor i8 %2, i8 %3 from %5 = xor i8 %2, i8 %3

UPDATE

To illustrate, the following C compiles to the following IR:

int c1(int a, int b, int c) {
    int d, e, f;
    if (a < b && b >= c) {
   ...

How do I determine that the first instruction of c1 assigns to %4?

; Function Attrs: norecurse nounwind optsize readnone uwtable
define dso_local i32 @c1(i32 %0, i32 %1, i32 %2) local_unnamed_addr #1 {
  %4 = icmp sge i32 %0, %1
  ...

Your questions somehow sound like variations over "how can I reconstruct the state of these objects that I have in RAM from their human-readable output?" Am I right to assume that you want to use LLVM, but don't want to use the programming language LLVM is written in, and so you try detours? You seem to get a fair amount of pain on your detours. — arnt, Jan 30 '22 at 19:37
@arnt Pretty close. I want to analyze LLVM IR. I'm not compiling, just doing program analysis. I'm doing the analysis in Python, but was surprised to learn that Python bindings don't expose IR very well (only creating it). The standard LLVM IR emitted is regular enough that regexen could work. I could, as you seem to suggest, stop my Python analysis program and create a C++ parser to parse IR to Python or JSON, but that seems to be a big detour. — SRobertJames, Jan 30 '22 at 20:07
Well, the biggest chunk of LLVM is passes, and analysing IR is what all passes do: Each pass analyses something, and most of them then make some changes based on the analysis. Since there are more than 200 passes in LLVM itself and many more outside, I dare say that you'll find better and higher-level helpers in the C++ classes they use than in python. If you want to use Python, consider parsing .bc rather than .ll. — arnt, Jan 31 '22 at 18:00

score 3 · Accepted Answer · answered Jan 30 '22 at 04:31

3

The entirety of %name = add i32 %lhs, %rhs is a single Instruction. The string name is retrieved by calling myInstruction->getName(). If it has no name, when printing it out we assign numbers starting at zero, but that number is only calculated as a running tally while printing.

In your example %1 = xor i8 %2, i8 %3 is one Instruction -- its own C++ object with an address in memory -- and %5 = xor i8 %2, i8 %3 is a different C++ llvm::Instruction object in memory.

answered Jan 30 '22 at 04:31

Nick Lewycky

1,182
6
14

Does that mean if I see an instruction `xor i8 %2, i8 %3`, and I want to find out where `%2` was defined (assigned to), I need to go to the beginning of the func and keep a tally of name-less operations? This seems to me error-prone; wouldn't it be more robust to treat `%2` just like any other name, and record it in the IR, just like `%n = ` is recorded? – SRobertJames Jan 30 '22 at 05:56
Please see update to Q. where a simple test suggests its not a simple tally (`%3` is skipped). – SRobertJames Jan 30 '22 at 06:04
Perhaps the first basic block always gets the next temp var as its (implicit) name? That would make the first block's implicit label `%3`, and this the next temp is `%4`. Is that correct? If so, it seems to me very messy. Is there an opt pass that can replace every temp name with a permanent name? I'm using LLVM IR for program analysis, not compilation, and I want to ensure I'm looking at the _correct_ regvars and blocks. – SRobertJames Jan 30 '22 at 06:12
@SRobertJames "Does that mean if I see an instruction `xor i8 %2, i8 %3`, and I want to find out where `%2` was defined (assigned to), I need to go to the beginning of the func and keep a tally of name-less operations?" If you see that instruction, its first operand will be an `Instruction` object that has no name. If you somehow figured out that its "implicit name" is "%2" and you then go back to the beginning and take the unnamed instruction with the number 2, then all you'd get is the `Instruction` object that you already had to begin with. So no, don't do that. – sepp2k Jan 30 '22 at 13:53
@SRobertJames As for your other questions: Yes, unnamed blocks are numbered in the same way and using the same tally and that's why it "skips" a number sometimes. And no, as far I know, there is no pass that assigns names to everything. This shouldn't be necessary because there should be no need to look up instructions by name because I don't see a situation where you'd have the name of an instruction without having access to the instruction itself already. – sepp2k Jan 30 '22 at 13:56
LLVM maintains a "use-def" graph. `%foo = xor i8 %2, i8 %3` uses %2 as an operand, so you simply ask `myInstruction->getOperand(0)` to get the `Value*` that is `%2`. You can also find who is using a given value with `uses()` that returns an iterable range. If you want strings for your instructions try `opt -instnamer`. – Nick Lewycky Jan 31 '22 at 05:47

Determine the reg that an LLVM IR instruction assigns to

UPDATE

1 Answers1