2

I have a simple LLVM pass that renames every function defined within the current translation unit (i.e: the source file in question, after all preprocessing steps have taken place - see here). My pass is as follows:

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/TypeFinder.h"
#include "llvm/Transforms/IPO.h"
#include "llvm/IR/Argument.h"
#include "llvm/IR/GlobalValue.h"

using namespace llvm;

namespace {

  struct FunctionRename : public ModulePass {
    static char ID; // Pass identification
    FunctionRename() : ModulePass(ID) {}

    bool runOnModule(Module &M) override {
      // Rename all functions
      for (auto &F : M) {
        StringRef Name = F.getName();
        // Leave library functions alone because their presence or absence
        // could affect the behaviour of other passes.
        if (F.isDeclaration())
          continue;
        F.setLinkage(GlobalValue::LinkOnceAnyLinkage);
        F.setName(Name + "_renamed");
      }
      return true;
    }
  };
}

char FunctionRename::ID = 0;
static RegisterPass<FunctionRename> X("functionrename", "Function Rename Pass");
// ===-------------------------------------------------------==//
//
// Function Renamer - Renames all functions
//

After running the pass over a bitcode file, file.bc, I output the result to a new file file_renamed.bc, as follows

opt -load /path/to/libFunctionRenamePass.so -functionrename < file.bc > file_renamed.bc

I then attempt to link the two files as follows

llvm-link file.bc file_renamed.bc -o file_linked.bc

However, I still get symbol clashes for C++ source files (from which the initial bitcode file is generated) where constructors and destructors are involved. My expectation was that the line

F.setLinkage(GlobalValue::LinkOnceAnyLinkage)

would prevent symbol clashes occurring for any symbols defined in file.bc and file_renamed.bc.

What am I doing wrong?

UnchartedWaters
  • 522
  • 1
  • 4
  • 14

2 Answers2

1

When I tried running your code on a sample bitcode file, the llvm-link step failed due to the global variables:

ERROR: Linking globals named 'my_global': symbol multiply defined!

After adding a second loop to the RunOnModule routine to process the global variables, llvm-link succeeds and then the code ultimately linked.

for (auto git = M.global_begin(), get = M.global_end(); git != get; ++git)
{
   GlobalValue* gv = &*git;
   gv->setLinkage(GlobalValue::LinkOnceAnyLinkage);
}

However, my simple test of C++ code with constructors worked both with and without this change.

Brian
  • 2,693
  • 5
  • 26
  • 27
0

My full solution is as follows:

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/TypeFinder.h"
#include "llvm/Transforms/IPO.h"
#include "llvm/IR/Argument.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Metadata.h"

using namespace llvm;

namespace {

  struct FunctionRename : public ModulePass {
    static char ID; // Pass identification
    FunctionRename() : ModulePass(ID) {}

    bool runOnModule(Module &M) override {

      for (auto it = M.global_begin(); it != M.global_end(); ++it)
      {
        GlobalVariable& gv = *it;
        if (!gv.isDeclaration())
          gv.setLinkage(GlobalValue::LinkerPrivateLinkage);
      }

      for (auto it = M.alias_begin(); it != M.alias_end(); ++it)
      {
        GlobalAlias& ga = *it;
        if (!ga.isDeclaration())
          ga.setLinkage(GlobalValue::LinkerPrivateLinkage);
      }

      // Rename all functions
      for (auto &F : M) {
        StringRef Name = F.getName();
        // Leave library functions alone because their presence or absence
        // could affect the behaviour of other passes.
        if (F.isDeclaration())
          continue;
        F.setLinkage(GlobalValue::WeakAnyLinkage);
        F.setName(Name + "_renamed");
      }
      return true;
    }
  };
}

char FunctionRename::ID = 0;
static RegisterPass<FunctionRename> X("functionrename", "Function Rename Pass");
// ===-------------------------------------------------------==//
//
// Function Renamer - Renames all functions
//

In the loop to process the functions, for(auto &F : M) { ... }, I prefer to use WeakAnyLinkage instead of LinkOnceAnyLinkage for the following reasons.

Globals with LinkOnceAnyLinkage - as the name suggests - are merged with other symbols of the same name when linkage occurs, and unreferenced globals with this linkage are allowed to be discarded.

Globals with WeakAnyLinkage share the same semantics of globals with LinkOnceAnyLinkage, except that unreferenced globals with WeakAnyLinkage may not be discarded.

Also, in the two loops to process globals and aliases, I use LinkerPrivateLinkage, because I do not want the globals in file_renamed.bc to be accessible by any objects outside this module.

Also, the loop for processing aliases is necessary (at least on my environment) for avoiding symbol clashes related to complete-object constructors and destructors (i.e: C1 and D1 destructors as per Itanium C++ ABI).

UnchartedWaters
  • 522
  • 1
  • 4
  • 14