0

I have an application dependent on many libraries. I am building everything from sources on an ubuntu machine. I want to remove any function/class that is not required by an application. Is there any tool to help with that?

P.S. I want to remove source code from the library not just symbols from object files.

  • Looks possible, see [here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53570). – Paul Sanders May 24 '18 at 06:15
  • You don't need to change the source code to remove unused functions, methods or classes from your binary. My answer is intended to show you how to do that with a simple compiler switch. Please let me know if anything needs clarification. – Paul Sanders May 25 '18 at 03:31
  • I understand your point and appreciate your help. However I wanted to optimize an open-source library, but I don't require the complete library functionality, so if I can strip the source code then it will be easier to optimize. – Ashok Vishnoi May 25 '18 at 06:47
  • Why will it be easier to optimise? What I don't understand is why you think that removing code from the source files will give different results to having the compiler and linker do it for you. – Paul Sanders May 25 '18 at 08:13
  • The original source code is very big. (~200k LOC). I need around 20k LOC. – Ashok Vishnoi May 28 '18 at 08:32
  • What is LOC? And if you are referring to the size of your binary, _try that flag_. It should remove any unreachable code and that might be all you need. Hacking around with the source of the library sounds like a bad idea - you're almost certain to break something. – Paul Sanders May 28 '18 at 08:46
  • LOC->Lines of Code I want to remove the unused code, optimize the code and maintain it. Maintaining whole library is too much for small functionality. – Ashok Vishnoi May 28 '18 at 11:22
  • Oh OK, I didn't realise you were contemplating such radical surgery, sorry. I think I'd just use a decent IDE. Mine has lots of 'Intellisense' tools that would help with that. – Paul Sanders May 28 '18 at 11:47

3 Answers3

1

I have now researched this a bit in the context of my own project and decided this was worth a full answer rather than just a comment. This answer is based on Apple's toolchain on macOS (which uses clang, rather than gcc), but I think things work in much the same way for both.

The key to this is enabling 'link time optimization' when building your libraries and executable(s). The mechanics of this are actually very simple - just pass -flto to gcc and ld on the command line. This has two effects:

  • Code (functions / methods) in object files or archives that is never called is omitted from the final executable.
  • The linker performs the sort of optimisations that the compiler can perform (such as function inlining), but with knowledge that extends across compilation unit boundaries.

It won't help you if you are linking against a shared library, but it might help if that shared library links with other (static) libraries which contain code that the shared library never calls.

On the upside, this reduced the size of my final executable by about 5%, which I'm pleased about. YMMV.

On the downside, my object files roughly doubled in size and sometimes link times increased dramatically (by something like a factor of 100). Then, if I re-linked, it was much faster. This behaviour might be a peculiarity of Apple's toolchain however. Perhaps it is stashing away some build intermediates somewhere on the first link. In any case, if you only enable this option for release builds it should not be a major issue.

There are more details of the full set of gcc command line options that control optimisation here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. Search that page for flto to narrow down your search.

And for a glimpse behind the scenes, see: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html

Edit:

A bit more information about link times. Apple's linker creates some huge files in a directory called LTOCache when you link. I've not seen these before today so these look to be the build intermediates that speed up linking second time around. As for my initial link being so slow, this may in part be due to the fact that, in my case, these are created on an SMB server. But then again, the CPU was maxed out so maybe not.

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
1

Standard strip utility was created exactly for this.

Chugaister
  • 364
  • 4
  • 12
  • That just discards _symbols_. The OP (and indeed I) want to omit code from the final executable that is never called. – Paul Sanders May 24 '18 at 11:45
1

OK, now that I understand the OP's requirements better I have another answer for this that I think might better suit his needs. I think the way to tackle this is with a code coverage tool. After all, the problem is identifying what you can safely get rid of it. Actually stripping it out is easy.

My IDE (Visual Studio) has one of these built in but I think the OP is using gcc so the first port of call appears to be gcov. There are a number of commercial options, but they are expensive. There's also a potentially useful post here.

The other thing you need, of course, is a program that exercises all the parts of the library that you want to keep to give you a coverage report to work from, but it sounds like the OP already has that. A good IDE will also help as it makes navigating around the code so much easier. In Visual Studio, I find Jump to Definition and quick and easy 'bookmarking' to be key features.

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • Thanks. It did help me to reduce the size. However I have to remove part of code which was not executed and build the code to check if any compilation error is there since coverage tools only provide runtime information. So if a function was called from an `else` part of the `if..else` statement and while execution the `if` part was executed. Then though the function shows as not executed in coverage report, it is not possible to remove it. – Ashok Vishnoi Jun 03 '18 at 04:12
  • Yes, that makes sense. Maybe you can go further and remove the `else` part also (you did say you wanted to get rid of as many source lines as possible). – Paul Sanders Jun 03 '18 at 05:11