1

I'm considering writing a function to estimate at least the full length of a formatted string coming from the sprintf(), snprintf() functions.

My approach was to parse the format string to find the various %s, %d, %f, %p args, creating a running sum of strlen()s, itoa()s, and strlen(format_string) to get something guaranteed to be big enough to allocate a proper buffer for snprintf().

I'm aware the following works, but it takes 10X as long, as all the printf() functions are very flexible, but very slow because if it.

  char c;
  int required_buffer_size = snprintf(&c, 1, "format string", args...);

Has this already been done ? - via the suggested approach, or some other reasonably efficient approach - IE: 5-50X faster than sprintf() variants?

user2548100
  • 4,571
  • 1
  • 18
  • 18
  • 6
    Takes "10x as long" as what exactly? The function you describe would have to pretty much implement snprintf again to be correct, unless you're willing to restrict it to a subset of the format specifiers... that's the only way you'd come up with a faster alternative. – vanza Feb 18 '14 at 19:35
  • 1
    Possible duplicate http://stackoverflow.com/questions/16647278/minimal-fast-implementation-of-sprintf-for-embedded – Joseph Quinsey Feb 18 '14 at 19:36
  • @vanza, can you prove your claim? I can do a LOT less work, like using a length of 12 for an int32_t, 24 for an int64_t, 80 for a %f, etc. I already have a 5-10X for handling just strings. I'd RX benchmarking snprintf(). All the printf() variants are incredibly slow. Sometimes they're the only correct solution, but this isn't one of those times. – user2548100 Feb 18 '14 at 19:51
  • 1
    @JosephQuinsey, That's exactly the kind of link I was looking for. Many thanks!!! – user2548100 Feb 18 '14 at 19:52
  • @user2548100: (i) does your optimized version cover all the grammar that the printf family does and (ii) is the slowdown really noticeable that it justifies maintaining your own version of snprintf (especially since, apparently, you'll still be using snprintf to actually write the string)? Finally, unless you're baking assumptions in your code about string lengths, not sure how you can get rid of `strlen`. If it's ok for you to operate under those assumptions ("baked in" string lengths, restricted grammar, ok to allocate unneeded memory, etc), you should make that clear. – vanza Feb 18 '14 at 21:15
  • 1
    Note: Values like "12 for an int32_t" are a good approach. I found `char[sizeof(some_int_type)*3 + 3]` works well. But `%f` does have a nasty worst case. With typical double, something like 320 bytes. Of course - this you may already know. – chux - Reinstate Monica Feb 18 '14 at 21:34
  • @vanza, I wouldn't be wasting people's time here creating a function I didn't need. I don't need all the grammar, and neither do millions of other C programmers. If you're running on a machine that has dozens, or even hundreds of giga-bytes, you don't care if you end up with an estimate that's too big, maybe by 2-3X, you care a lot that you don't truncate the string, or overflow the buffer. The latter failure mode seems to elude otherwise good programmers all the time, believing that sNprintf() protects them. It doesn't. They just get a different failure mode. – user2548100 Feb 19 '14 at 01:35
  • @vanza, I think it's very clear that you can't get rid of strlen() to find the length of passed strings. On the other hand, that info is often available and not exported from other supporting string functions. I like the Kamailio and Gnome approach of having a struct that has a string pointer, allocation size, and str_len. – user2548100 Feb 19 '14 at 01:40
  • @user2548100: well, it wouldn't have hurt to include that info in the question, would it? Without that kind of information, I at least always err on the side of safety (in this case, pointing out that you might have to deviate from snprintf's interface to achieve any sort of speed up). Same things for strlen; you didn't say anything about using your own struct for strings, so the safe assumption is that you're using basic C-style strings. – vanza Feb 19 '14 at 01:42
  • I'm using basic C-style strings, which requires strlen() processing for this particular question. – user2548100 Feb 19 '14 at 02:16
  • @vanza, I confirmed these benchmarks for converting an integer to a string, and snprintf() is indeed this much slower. http://stackoverflow.com/questions/21501815/optimal-base-10-only-itoa-function/21502575#21502575 – user2548100 Feb 19 '14 at 02:20
  • I believe this problem is much easier to solve than many believe. In short, set up a table of test values, feed them to snprintf(), and record the the lengths it returns. All you need then is a parser and a hashtable (an unordered map from the STL) which associates a type specification with the values snprintf() returned for your test data at initialization time. Several tables could be kept, some whose values are hard-coded, depending on how optimistic you can afford to be in a given context. To chux case above, pi might substitute for a worst case, best obtained as (4.0 * atan(1.0)) –  Jul 30 '14 at 07:38
  • @chux, the worst case is defined by _CVTBUFSIZE in _fcvt() in MSVC. –  Jul 30 '14 at 07:40

2 Answers2

5

Allocate a big enough buffer first and check if it was long enough. If it wasn't reallocate and call a second time.

int len = 200;  /* Any number well chosen for the application to cover most cases */
int need;
char *buff = NULL; 
do {
  need = len+1;
  buff = realloc(buff, need);   /* I don't care for return value NULL */
  len = snprintf(buff, need, "...", ....);
  /* Error check for ret < 0 */
} while(len > need);
/* buff = realloc(buff, len+1); shrink memory block */

By choosing your initial value correctly you will have only one call to snprintf() in most cases and the little bit of over-allocation shouldn't be critical. If you're in a so tight environment that this overallocation is critical, then you have already other problems with the expensive allocation and formating. In any case, you could still call a realloc() afterwards to shrink the allocated buffer to the exact size.

Patrick Schlüter
  • 11,394
  • 1
  • 43
  • 48
  • Pretty much the approach I'm using now, except I have a well-defined step size for realloc(). Decent, but I think a quick and dirty estimate is better in my context. A nice tweak is to keep a max_size because in my context this is running in a loop. The only caveat on this is there should be some limit on the max size to prevent massive allocations due to missing nul-terminators, or malicious string injections. – user2548100 Feb 18 '14 at 19:57
  • In that case, you will have the same problem with an estimation based on parsing the format string and calling `strlen()`. In any case, when building strings that are in a time critical loop; I prefer avoiding completely the generic format function and work only with O(1) string concatenation i.e. no `strcat()` and `strlen()`. – Patrick Schlüter Feb 18 '14 at 20:02
  • I'm not sure what you mean by "work only with O(1) string concatenation". Could you expand on that? I'm guessing you are talking about Kamailio style _str { char *p; int len;} "strings"? I like that approach, but trying to avoid scope creep on this rickety old system I've had dumped in my lap. – user2548100 Feb 19 '14 at 02:13
  • Yes, it's exactly that. Using and tracking the length of the strings you work on, avoiding the O(n) functions `strcat()`, `strlen()` and such. It doesn't always require a string class. It only needs to pass the length in the parameters and have a concat function that returns either the resulting length or a pointer to the end. Of course this all depends on the project and the platform and the situation. – Patrick Schlüter Feb 19 '14 at 08:36
2

If the first argument to snprintf is NULL, the return value is the number of characters that would have been written.

eduffy
  • 39,140
  • 13
  • 95
  • 92