1

I am curious if an executable is poorly written that it has much dead code, referring to 1000s of functions externally (i.e. .so files) but only 100s of those functions are actually called during runtime, will LD_BIND_NOW=1 be worse than LD_BIND_NOW not set? Because the Procedure Linkage Table will contain 900 useless function addresses? Worse in a sense of memory footprint and performance (as I don't know whether the lookup is O(n)).

I am trying to see whether setting LD_BIND_NOW to 1 will help (by comparing to LD_BIND_NOW not set):
1. a program that runs 24 x 5 in terms of latency
2. saving 1 microsecond is considered big in my case as the code paths being executed during the life time of the program are mainly processing incoming messages from TCP/UDP/shared memory and then doing some computations on them; all these code paths take very short time (e.g. < 10 micro) and these code paths will be run like millions of times

Whether LD_BIND_NOW=1 helps the startup time doesn't matter to me.

Hei
  • 1,844
  • 3
  • 21
  • 35
  • Very unclear question. Are you concerned with the overall execution time of a short-running program? Which one? Do you have the source code of that executable? Why do you actually care? Please **edit your question** to improve it a lot and give context and motivation. At last, did you benchmark (e.g. using [time(1)](http://man7.org/linux/man-pages/man1/time.1.html)? In most cases, it won't matter at all! – Basile Starynkevitch Jan 05 '18 at 07:40
  • @BasileStarynkevitch my source has IP and so can't post here...plus it is like a 500K line of code so no point to post. Benchmarking indeed is the most accurate way; however, my question is to get a general answer for my situation (just added) instead of a very specific answer going through line by line of code. Hope this clarifies something. – Hei Jan 05 '18 at 08:38
  • See references in [this answer](https://stackoverflow.com/a/48109798/841108) – Basile Starynkevitch Jan 05 '18 at 08:40
  • Your question is really unclear, and you look confused. Try to profile and benchmark your program. Otherwise, **edit your question** to improve it, perhaps by giving some [MCVE]. I feel you are using "execution of program" in a wrong way. On Linux it is related to `execve`. Maybe you mean execution of some code in your program (which is very different) – Basile Starynkevitch Jan 05 '18 at 08:51
  • @BasileStarynkevitch I replaced "execution" to avoid the confusion specific to linux. A concrete minimal example is hard as my question is a general question of whether LD_BIND_NOW=1 could reduce the execution times of all possible code paths in the program if the program is poorly written that there is so much dead code. I am just looking for an average case not a specific case. – Hei Jan 05 '18 at 09:00
  • Follow the references in my answer below, and in [this related answer](https://stackoverflow.com/a/48109798/841108), notably Drepper's paper. – Basile Starynkevitch Jan 05 '18 at 09:00
  • I guess that your performance issues could require some program refactoring (because of accumulated [technical debt](https://en.wikipedia.org/wiki/Technical_debt)) which takes a lot of effort and time. – Basile Starynkevitch Jan 05 '18 at 09:13
  • Yeah, I wish I could refactor. Working on projects for companies is different from working on projects for myself lol. – Hei Jan 05 '18 at 09:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/162580/discussion-between-basile-starynkevitch-and-hei). – Basile Starynkevitch Jan 05 '18 at 09:34

1 Answers1

1

saving 1 microsecond is considered big in my case as the executions by the program are all short (e.g. <10 micro)

This is unlikely (or you mean something else). A typical call to execve(2) -the system call used to start programs- is usually lasting several milliseconds. So it is rare (and practically impossible) that a program executes (from execve to _exit(2)) in microseconds.

I suggest that your program should not be started more than a few times per second. If indeed the entire program is very short lived (so its process lasts only a fraction of a second), you could consider some other approach (perhaps making a server running those functions).

LD_BIND_NOW will affect (and slow down) the start-up time (e.g. in the dynamic linker ld-linux(8)). It should not matter (except for cache effects) the steady state execution time of some event loop.

See also references in this related answer (to a different question), they contain detailed explanations relevant to your question.

In short, the setting of LD_BIND_NOW will not affect significantly the time needed to handle each incoming message in a tight event loop.

Calling functions in shared libraries (containing position-independent code) might be slightly slower (by a few percents at most, and probably less on x86-64) in some cases. You could try static linking, and you might consider even link time optimization (i.e. compiling and linking all your code -main programs and static libraries- with -flto -O2 if using GCC).

You might have accumulated technical debt, and you could need major code refactoring (which takes a lot of time and effort, that you should budget).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thanks for your answer. I think I confused you. I was trying to make it clears that the program lives long by saying "running 24 x 5" and each execution is short. I just updated my question with the definition of "execution". Sorry for lack of clarity. – Hei Jan 05 '18 at 08:49
  • In practice, `LD_BIND_NOW` does not matter in your case – Basile Starynkevitch Jan 05 '18 at 09:07
  • I use static linking for 99% of the libraries the executable use. -flto is an interesting one. I tried it out once with -fwhole-program and -O3 with GCC 4.6.x years ago, and for one code path (which I measured specifically because it was important to me), and it was slower than without those flags. And I feel those flags will optimize some code paths, but make some code paths worse. Afterall, GCC doesn't know which code paths are important and so when optimization is requested, GCC will just do its best with guess. – Hei Jan 05 '18 at 09:37