I want to explain my question through a practical problem I met in my project.
I am writing a c library( which behaves like a programmable vi editor
), and i plan to provide a series of APIs ( more than 20 in total ):
void vi_dw(struct vi *vi);
void vi_de(struct vi *vi);
void vi_d0(struct vi *vi);
void vi_d$(struct vi *vi);
...
void vi_df(struct vi *, char target);
void vi_dd(struct vi *vi);
These APIs do not perform core operations, they are just wrappers. For example, I can implement vi_de()
like this:
void vi_de(struct vi *vi){
vi_v(vi); //enter visual mode
vi_e(vi); //press key 'e'
vi_d(vi); //press key 'd'
}
However, if the wrapper is as simple as such, I have to write more than 20 similar wrapper functions.
So, I consider implementing more complex wrappers to reduce the amount:
void vi_d_move(struct vi *vi, vi_move_func_t move){
vi_v(vi);
move(vi);
vi_d(vi);
}
static inline void vi_dw(struct vi *vi){
vi_d_move(vi, vi_w);
}
static inline void vi_de(struct vi *vi){
vi_d_move(vi, vi_e);
}
...
The function vi_d_move()
is a better wrapper function, he can convert a part of similar move operation to APIs, but not all, like vi_f()
, which need another wrapper with a third argument char target
.
I finished explaining the example picked from my project.
The pseudo code above is simper than real case, but is enough to show that:
The more complex the wrapper is, the less wrappers we need, and the slower they will be.(they will become more indirect or need to consider more conditions).
There are two extremes:
use only one wrapper but complex enough to adopt all move operations and convert them into corresponding APIs.
use more than twenty small and simple wrappers. one wrapper is one API.
For case 1, the wrapper itself is slow, but it has more chance resident in cache, because it is often executed(all APIs share it). It's a slow but hot path.
For case 2, these wrappers are simple and fast, but has less chance resident in cache. At least, for any API first time called, a cache miss will happen.(CPU need to fetch instructions from memory, but not L1, L2).
Currently, I implemented five wrappers, each of them are relatively simple and fast. this seems to be a balance, but just seems. I chose five just because I felt the move operation can be divided into five groups naturally. I have no idea how to evaluate it, I don't mean a profiler, I mean, in theory, what main factors should be considered in such case?
In the post end, I want to add more detail for these APIs:
These APIs need to be fast. Because this library is designed as a high performance virtual editor. The delete/copy/paste operation is designed to approach the bare C code.
A user program based on this library seldom calls all these APIs, only parts of them, and usually no more than 10 times for each.
In real case, the size of these simple wrappers are about 80 bytes each, and will be no more than 160 bytes even merged into a single complex one. (but will introduce more if-else branches).
4, As with the situation the library is used, I will take lua-shell
as example(a little off-topic, but some friends want to know why I so care its performance):
lua-shell
is a *nix shell which uses lua
as its script. Its command execution unit(which do forks(), execute()..) is just a C module registered into the lua state machine.
Lua-shell
treats everything as lua
.
So, When user input:
local files = `ls -la`
And press Enter
. The string input is first sent to lua-shell's preprocessor————which convert mixed-syntax to pure lua code:
local file = run_command("ls -la")
run_command()
is the entry of lua-shell's command execution unit, which, I said before, is a C module.
We can talk about libvi
now. lua-shell's preprocessor is the first user of the library I am writing. Here is its relative codes(pseudo):
#include"vi.h"
vi_loadstr("local files = `ls -la`");
vi_f(vi, '`');
vi_x(vi);
vi_i(vi, "run_command(\"");
vi_f(vi, '`');
vi_x(vi);
vi_a(" \") ");
The code above is parts of luashell's preprocessor implementation. After generating the pure lua code, he feeds it to Lua State Machine and run it.
The shell user is sensitive to the time interval between Enter
and a new prompt, and in most case lua-shell needs preprocess script with larger size and more complicate mixed-syntax.
This is a typical situation where libvi
is used.