1

There have been a number of questions on similar topics but none which I found that explore the options in this way.

Often we need to wrap a legacy C-API in C++ to use it's very good functionality while protecting us from the vagaries. Here we will focus just on one element. How to wrap legacy C-functions which accept char* params. The specific example is for an API (the graphviz lib) which accepts many of its params as char* without specifying if that is const or non-const. There appears to be no attempt to modify, but we can't be 100% sure.

The use case for the wrapper is that we want to conveniently call the C++ wrapper with a variety of "stringy" properties names and values, so string literals, strings, const strings, string_views, etc. We want to call both singly during setup where performance is non-critical and in the inner loop, 100M+ times, where performance does matter. (Benchmark code at bottom)

The many ways of passing "strings" to functions have been explained elsewhere.

The code below is heavily commented for 4 options of the cpp_wrapper() function being called 5 different ways.

Which is the best / safest / fastest option? Is it a case of Pick 2?

#include <array>
#include <cassert>
#include <cstdio>
#include <string>
#include <string_view>

void legacy_c_api(char* s) {
  // just for demo, we don't really know what's here.
  // specifically we are not 100% sure if the code attempts to write
  // to char*. It seems not, but the API is not `const char*` eventhough C
  // supports that
  std::puts(s);
}

// the "modern but hairy" option
void cpp_wrapper1(std::string_view sv) {
  // 1. nasty const_cast. Does the legacy API modifY? It appears not but we
  // don't know.

  // 2. Is the string view '\0' terminated? our wrapper api can't tell
  // so maybe an "assert" for debug build checks? nasty too?!
  // our use cases below are all fine, but the API is "not safe": UB?!
  assert((int)*(sv.data() + sv.size()) == 0);

  legacy_c_api(const_cast<char*>(sv.data()));
}

void cpp_wrapper2(const std::string& str) {
  // 1. nasty const_cast. Does the legacy API modifY? It appears not but we
  //    don't know. note that using .data() would not save the const_cast if the
  //    string is const

  // 2. The standard says this is safe and null terminated std::string.c_str();
  //    we can pass a string literal but we can't pass a string_view to it =>
  //    logical!

  legacy_c_api(const_cast<char*>(str.c_str()));
}

void cpp_wrapper3(std::string_view sv) {
  // the slow and safe way. Guaranteed be '\0' terminated.
  // is non-const so the legacy can modfify if it wishes => no const_cast
  // slow copy?  not necessarily if sv.size() < 16bytes => SBO on stack
  auto str = std::string{sv};
  legacy_c_api(str.data());
}

void cpp_wrapper4(std::string& str) {
  // efficient api by making the proper strings in calling code
  // but communicates the wrong thing altogether => effectively leaks the c-api
  // to c++
  legacy_c_api(str.data());
}

// std::array<std::string_view, N> is a good modern way to "store" a large array
// of "stringy" constants? they end up in .text of elf file (or equiv). They ARE
// '\0' terminated. Although the sv loses that info. Used in inner loop => 100M+
// lookups and calls to legacy_c_api;
static constexpr const auto sv_colours =
    std::array<std::string_view, 3>{"color0", "color1", "color2"};

// instantiating these non-const strings seems wrong / a waste (there are about
// 500 small constants) potenial heap allocation in during static storage init?
// => exceptions cannot be caught... just the wrong model?
static auto str_colours =
    std::array<std::string, 3>{"color0", "color1", "color2"};

int main() {
  auto my_sv_colour  = std::string_view{"my_sv_colour"};
  auto my_str_colour = std::string{"my_str_colour"};

  cpp_wrapper1(my_sv_colour);
  cpp_wrapper1(my_str_colour);
  cpp_wrapper1("literal_colour");
  cpp_wrapper1(sv_colours[1]);
  cpp_wrapper1(str_colours[2]);

  // cpp_wrapper2(my_sv_colour); // compile error
  cpp_wrapper2(my_str_colour);
  cpp_wrapper2("literal_colour");
  // cpp_wrapper2(colours[1]); // compile error
  cpp_wrapper2(str_colours[2]);

  cpp_wrapper3(my_sv_colour);
  cpp_wrapper3(my_str_colour);
  cpp_wrapper3("literal_colour");
  cpp_wrapper3(sv_colours[1]);
  cpp_wrapper3(str_colours[2]);

  // cpp_wrapper4(my_sv_colour);  // compile error
  cpp_wrapper4(my_str_colour);
  // cpp_wrapper4("literal_colour"); // compile error
  // cpp_wrapper4(sv_colours[1]); // compile error
  cpp_wrapper4(str_colours[2]);
}

Benchmark code

Not entirely realistic yet, because work in C-API is minimal and non-existent in C++ client. In the full app I know that I can do 10M in <1s. So just changing between these 2 API abstraction styles looks like it might be a 10% change? Early days...needs more work. Note: that's with a short string which fits in SBO. Longer ones with heap allocation just blow it out completely.

#include <benchmark/benchmark.h>

static void do_not_optimize_away(void* p) {
    asm volatile("" : : "g"(p) : "memory");
}

void legacy_c_api(char* s) {
  // do at least something with the string
  auto sum = std::accumulate(s, s+6, 0);
  do_not_optimize_away(&sum);
}

// ... wrapper functions as above: I focused on 1&3 which seem 
// "the best compromise". 
// Then I added wrapper4 because there is an opportunity to use a 
// different signature when in main app's tight loop. 

void bench_cpp_wrapper1(benchmark::State& state) {
  for (auto _: state) {
    for (int i = 0; i< 100'000'000; ++i) cpp_wrapper1(sv_colours[1]);
  }
}
BENCHMARK(bench_cpp_wrapper1);

void bench_cpp_wrapper3(benchmark::State& state) {
  for (auto _: state) {
    for (int i = 0; i< 100'000'000; ++i) cpp_wrapper3(sv_colours[1]);
  }
}
BENCHMARK(bench_cpp_wrapper3);

void bench_cpp_wrapper4(benchmark::State& state) {
  auto colour = std::string{"color1"};
  for (auto _: state) {
    for (int i = 0; i< 100'000'000; ++i) cpp_wrapper4(colour);
  }
}
BENCHMARK(bench_cpp_wrapper4);

Results

-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
bench_cpp_wrapper1   58281636 ns     58264637 ns           11
bench_cpp_wrapper3  811620281 ns    811632488 ns            1
bench_cpp_wrapper4  147299439 ns    147300931 ns            5
Oliver Schönrock
  • 1,038
  • 6
  • 11

2 Answers2

2

Is the string view '\0' terminated?

If it happens to point to null terminated string, then sv.data() may be null terminated. But string view does not need to be null terminated, so one should not assume that it is. Thus cpp_wrapper1 is a bad choice.

Does the legacy API modifY? .. we don't know.

If you don't know whether the API modifies the string, then you cannot use const, so cpp_wrapper2 is not an option.


One thing to consider is whether a wrapper is necessary. Most efficient solution is to pass a char*, which is just fine in C++. If using const strings is a typical operation, then cpp_wrapper3 may be useful - but is it typical considering the operations may modify the string? cpp_wrapper4 is more efficient than 3, but not as efficient as plain char* if you don't already have a std::string.

You can provide all of the options mentioned above as overloads.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Yes. sv.data() is `'\0'` terminated in all current use cases, but if I use option 1 then will someone pass a non terminated thing at some point? I appreciate the thought of keeping char* in C++. I certainly thought about it. Apart from having to put `// NOLINT` on everything it would also need to be `non-const char*` which means I can't store the colours in `.text`.... Seems to me there are "no really correct or good options here" ...only compromises? Currently I have option1 but intend to probably change to option3 after measuring the performance impact precisely. – Oliver Schönrock Jan 24 '20 at 19:10
2

Correct first, then optimize if needed.

  • wrapper1 has at least two potential instances of undefined behavior: The dubious const_cast, and (in debug versions) possibly accessing an element past the end of an array. (You can create a pointer to one element past the last, but you cannot access it.)

  • wrapper2 also has a dubious const_case, potentially invoking undefined behavior.

  • wrapper3 doesn't rely on any UB (that I see).

  • wrapper4 is similar to wrapper3, but exposes details you're trying to encapsulate.

Start by doing the most correct thing, which is to copy the strings and pass a pointer to the copy, which is wrapper3.

If performance is unacceptable in the tight loop, you can look at alternatives. The tight loop may use only a subset of the interfaces. The tight loop may be heavily biased toward short strings or long strings. The compiler might inline enough of your wrapper in the tight loop that it's effectively a no-op. These factors will affect how (and if) you solve the performance problem.

Alternative solutions might involve caching to reduce the number of copies made, investigating the underlying library enough to make some strategic changes (like changing the underlying library to use const where it can), or by making an overload that exposes the char * and passes it straight through (which shifts the burden to the caller to know what's right).

But all of that is implementation detail: Design the API for usability by the callers.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • That's all sound advice, that I would give too. But we agree? Only compromise answers. No other options which makes sense? I did the benchmarking results above now, did you see them? It's a 15x difference between 1&3 which has a 10-20% impact in my inner loop. – Oliver Schönrock Jan 24 '20 at 20:59
  • You're right that there are about 20 of these interfaces and the tight loop uses precisely one of them. So that may be my compromise. I could provide the one used by the tight lop with an overload of wrapper4 (because it turns out in the tight loop we have the std::string in caller) and wrapper3 for all of them (incl the one used in inner loop), because wrapper3 is a nice clean, easy to use, safe and sane API. Right? – Oliver Schönrock Jan 24 '20 at 21:07
  • Added benchmarking for wrapper4. It's 3x slower than wrapper1 and 5x faster than wrapper3. So i think that puts it in the "doesn't make a substantial difference to inner loop" territory. So wrapper3 for everything. Then a single wrapper4 overload for tight loop. Agree? – Oliver Schönrock Jan 24 '20 at 21:18
  • Happy to report special overload for wrapper4 makes only an immaterial difference in the inner loop ...so have gone with that and the "clean" wrapper3 everywhere else. It appears There are no magic bullets here unfortunately, for a general solution of interfacing C++ to C `char*`, so we have to compromise. Going to accept your answer as it got me thinking in the right direction, ie specialise "only the exact one call" which you need to, and keep the rest of the API as "convenient and clean" as possible. – Oliver Schönrock Jan 24 '20 at 22:20