23

The operands for an llvm::User (e.g. instruction) are llvm::Values.

After the mem2reg pass, variables are in SSA form, and their names as corresponding to the original source code are lost. Value::getName() is only set for some things; for most variables, which are intermediaries, its not set.

The instnamer pass can be run to give all the variables names like tmp1 and tmp2, but this doesn't capture where they originally come from. Here's some LLVM IR beside the original C code:

enter image description here

I am building a simple html page to visualise and debug some optimisations I am working on, and I want to show the SSA variables as namever notation, rather than just temporary instnamer names. Its just to aid my readability.

I am getting my LLVM IR from clang with a commandline such as:

 clang -g3 -O1 -emit-llvm -o test.bc -c test.c

There are calls to llvm.dbg.declare and llvm.dbg.value in the IR; how do you turn into the original sourcecode names and SSA version numbers?

So how can I determine the original variable (or named constant name) from an llvm::Value? Debuggers must be able to do this, so how can I?

Will
  • 73,905
  • 40
  • 169
  • 246
  • 3
    Which program did you use to create such a nice code assembly | source comparision? – Jack L. Jan 31 '14 at 22:53
  • 6
    @JackL. I quickly wrote it myself. Its just a javascript canvas. When someone earns 500 pts giving human-readable namesver to the Values, I might even release it hint hint ;) – Will Jan 31 '14 at 23:03
  • @Will Did you end up releasing your comparison tool? It would be tremendously useful to many people, I suspect. – ransford Apr 24 '14 at 19:32
  • @ransford afraid not. And, sadly, I never did get much nearer going from intermediary name to SSAnum either. My project ran into other difficulties such as LLVM not preserving pointers, which I understand has bitten lots of people wanting precise GC and porting to VLIWs etc too :( – Will Apr 25 '14 at 08:00

4 Answers4

13

This is part of the debug information that's attached to LLVM IR in the form of metadata. Documentation is here. An old blog post with some background is also available.


$ cat  > z.c
long fact(long arg, long farg, long bart)
{
    long foo = farg + bart;
    return foo * arg;
}

$ clang -emit-llvm -O3 -g -c z.c
$ llvm-dis z.bc -o -

Produces this:

define i64 @fact(i64 %arg, i64 %farg, i64 %bart) #0 {
entry:
  tail call void @llvm.dbg.value(metadata !{i64 %arg}, i64 0, metadata !10), !dbg !17
  tail call void @llvm.dbg.value(metadata !{i64 %farg}, i64 0, metadata !11), !dbg !17
  tail call void @llvm.dbg.value(metadata !{i64 %bart}, i64 0, metadata !12), !dbg !17
  %add = add nsw i64 %bart, %farg, !dbg !18
  tail call void @llvm.dbg.value(metadata !{i64 %add}, i64 0, metadata !13), !dbg !18
  %mul = mul nsw i64 %add, %arg, !dbg !19
  ret i64 %mul, !dbg !19
}

With -O0 instead of -O3, you won't see llvm.dbg.value, but you will see llvm.dbg.declare.

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • I think theory and practice diverge :( I've never had any llvm.dbg.value calls emitted by clang. The documentation I had read before I asked on SO. – Will Jan 28 '14 at 20:14
  • 2
    @Will: clang does not emit `llvm.dbg.value`. Optimizations emit them when they place values into registers (instead of the more easily accessible stack slots). – Eli Bendersky Jan 28 '14 at 21:02
  • Well ok, then how do you get mem2reg or whatever to emit them? And would they be the secret sauce that lets me turn Values into sourcecode names, and if so, how? – Will Jan 29 '14 at 07:04
  • @Will: if you take a non-trivial C function and emit LLVM IR from it with clang -g (debug info enabled), you'll see debug information in it. llvm.dbg.declare links stack values to original C objects. `mem2reg` may then create some llvm.dbg.value intrinsics; not all LLVM-level values have direct mapping to original C objects, of course - some are just temporaries, ABI-related values, C++ related lowering, etc. – Eli Bendersky Jan 29 '14 at 14:40
  • Actually, that's pretty much *exactly* what I've been doing. I just don't have any llvm.dbg.values in the emitted bc. I've added more info to my question, and I'm waiting for the bounty period to start in the hope someone actually works out the actual code to go from Value to name/version. – Will Jan 29 '14 at 19:44
  • `clang -O0` generates unoptimized code, which won't have `llvm.dbg.value` because all the locals are on stack. You need to generate optimized code to see `llvm.dbg.value` – Eli Bendersky Jan 29 '14 at 20:07
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/46371/discussion-between-will-and-eli-bendersky) – Will Jan 29 '14 at 20:11
11

Given a Value, getting variable name from it can be done by traversing all the llvm.dbg.declare and llvm.dbg.value calls in the enclosing function, checking if any refers to that value, and if so, return the DIVariable associated with the value by that intrinsic call.

So, the code should look something like (roughly, not tested or even compiled):

const Function* findEnclosingFunc(const Value* V) {
  if (const Argument* Arg = dyn_cast<Argument>(V)) {
    return Arg->getParent();
  }
  if (const Instruction* I = dyn_cast<Instruction>(V)) {
    return I->getParent()->getParent();
  }
  return NULL;
}

const MDNode* findVar(const Value* V, const Function* F) {
  for (const_inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
    const Instruction* I = &*Iter;
    if (const DbgDeclareInst* DbgDeclare = dyn_cast<DbgDeclareInst>(I)) {
      if (DbgDeclare->getAddress() == V) return DbgDeclare->getVariable();
    } else if (const DbgValueInst* DbgValue = dyn_cast<DbgValueInst>(I)) {
      if (DbgValue->getValue() == V) return DbgValue->getVariable();
    }
  }
  return NULL;
}

StringRef getOriginalName(const Value* V) {
  // TODO handle globals as well

  const Function* F = findEnclosingFunc(V);
  if (!F) return V->getName();

  const MDNode* Var = findVar(V, F);
  if (!Var) return "tmp";

  return DIVariable(Var).getName();
}

You can see above I was too lazy to add handling of globals, but it's not that big a deal actually - this requires iterating over all the globals listed under the current compile unit debug info (use M.getNamedMetadata("llvm.dbg.cu") to get a list of all the compile units in the current module), then checking which matches your variable (via the getGlobal method) and returning its name.

However, keep in mind the above will only work for values directly associated with original variables. Any value that is a result of any computation will not be properly named this way; and in particular, values that represent field accesses will not be named with the field name. This is doable but requires more involved processing - you'll have to identify the field number from the GEP, then dig into the type debug information for the struct to get back the field name. Debuggers do that, yes, but no debugger operates in LLVM IR land - as far as I know even LLVM's own LLDB works differently, by parsing the DWARF in the object file into Clang types.

Will
  • 73,905
  • 40
  • 169
  • 246
Oak
  • 26,231
  • 8
  • 93
  • 152
  • 1
    Very nice and very similar to how I've been finding proper names (although I've been making a reverse map of the Function's symbol table to speed things up slightly). The big problem is in the *however* section, though; almost all variables in IR are temporaries, although as a human you can follow them back and see where they come from. – Will Jan 31 '14 at 23:01
  • It looks like it no longer works (debug values had a major rework shortly after this answer and now they don't return Value * among other things) and I'm not sure what is the correct way to do this now. – Dan M. Jan 30 '19 at 14:16
1

If you are using a recent version of Clang some of the other approaches will not work. Instead, use the -fno-discard-value-names flag for clang. This will make the llvm::Values keep their original names

JKRT
  • 1,179
  • 12
  • 25
0

I had a similar requirement, converting the IR into "SSA variables as VarNamever notation". The following documentation and links helped me. 1) https://releases.llvm.org/3.4.2/docs/tutorial/LangImpl7.html 2) LLVM opt mem2reg has no effect

Hope this helps the community!!!

Amit
  • 85
  • 1
  • 9
  • 2
    Hi, the links were helpful but I was still unable to achieve this. Could you please elaborate on how you managed to get the original variable names? Thanks – nicolas-mosch May 01 '19 at 09:54