4

I want to do something like this:

char sLength[SIZE_T_LEN];

sprintf(sLength, "%zu", strlen(sSomeString));

That is, I want to print in a buffer a value of type size_t. The question is: what should be the value of SIZE_T_LEN? The point is to know the minimum size required in order to be sure that an overflow will never occur.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Mr Sunday
  • 43
  • 3

4 Answers4

8

sizeof(size_t) * CHAR_BIT (from limits.h) gives the number of bits. For decimal output, use a (rough) lower estimate of 3 (4 for hex) bits per digit to be on the safe side. Don't forget to add 1 for the nul terminator of a string.

So:

#include <limits.h>

#define SIZE_T_LEN ( (sizeof(size_t) * CHAR_BIT + 2) / 3 + 1 )

This yields a value of type size_t itself. On typical 8/16/32/64 bit platforms + 2 is not required (the minimum possible size of size_t is 16 bits). Here the error is already large enough to yield a correctly truncated result.

Note that this gives an upper bound and is fully portable due to the use of CHAR_BIT. To get an exact value, you have to use log10(SIZE_MAX) (see JohnBollinger's answer for this). But that yields a float and might be calculated at run-time, while the version above is compile-time evaluated (and likely costs more stack than the rough estimate already). Unless you have a very RAM constrained system, that is ok. And on such a system, you should refrain from using stdio anyway.

To be absolutely on the safe side, you might want to use snprintf (but that is not necessary).

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • `size_t` may contain padding bits, so this is an overestimation in general. The correct answer should probably involve `SIZE_MAX` in some capacity. – Kerrek SB Jan 26 '16 at 17:32
  • @KerrekSB: You might be right, but as that is an upper bound anyway and most current systems don't have padding bits, this will never be too small. – too honest for this site Jan 26 '16 at 17:34
  • Not natural log but base 10, as in `ceil(log10(SIZE_MAX))` – Weather Vane Jan 26 '16 at 17:47
  • @WeatherVane: Thanks, I took the version from your comment, if you don't mind. I don't work much with `math.h` in my field and I learned `log` actually being the decimal logarithm (`ln` would be logarithmus naturalis). Anyway, I really think the simple version is absolutely fine, as it is just to be portably on the safe side. – too honest for this site Jan 26 '16 at 17:52
  • Yes as you say, the log method is no use for a predefined value. – Weather Vane Jan 26 '16 at 17:54
  • You should actually use a lower estimate for the number of bits per digit since you divide! `#define SIZE_T_LEN (((sizeof(size_t) * CHAR_BIT) + 3) / 3 + 1)`. – chqrlie Jan 26 '16 at 18:01
  • @chqrlie: My bad! Thanks, I should not do maths today. (OTOH, you also had a flaw by adding 3, not 2 for rounding;-). – too honest for this site Jan 26 '16 at 18:08
  • @chqrlie: Actually, explicit rounding is not required due to accumulation of the error. See the edit. – too honest for this site Jan 26 '16 at 18:15
  • While this is a practical method to reserve enough space for the number string, is it _really_ guaranteed, that sprintf will print only one char per digit? Or, from another perspective, would a system violate the c-standard, if it used a character encoding for printing the number string which needed 2 bytes per digit? – Ctx Jan 26 '16 at 18:49
  • @Olaf: my bad for being inadvertently too pessimistic... btw your no rounding up the division is not strictly correct: on the DS9K model 17, with 17 bit `sizeof_t` types, `131071` does not fit in 5 bytes. Same problem with model 20 and some other ones, but it works for 16, 32 and 64 bits. – chqrlie Jan 26 '16 at 20:42
  • @Ctx: `sprintf` is guaranteed to print only 1 `char` per digit. It may use different digit values on different platform because not every computer uses ASCII, but all digits are guaranteed by the standard to be single chars. – chqrlie Jan 26 '16 at 20:45
  • @chqrlie Thanks for clarifying! Do you by chance have a hint for me from where in the standard this can be deducted? I couldn't find the place. – Ctx Jan 26 '16 at 20:47
  • 1
    @Ctx: C11 **5.2.1 Character sets** *... The representation of each member of the source and execution basic character sets shall fit in a byte. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.* All letters, digits and C graphics symbols are guaranteed to be represented in as one byte, the digits are guaranteed to be consecutive. – chqrlie Jan 26 '16 at 20:57
  • @chqrlie Thank you, I think it means what you say. So, in a japanese environment, printf with %d printing japanese digits in utf8 can never be standard compliant. Too bad, but I guess it would break way too much when trying to change that. – Ctx Jan 26 '16 at 21:05
  • @Ctx: even in a Japanese environment, `printf` would convert numbers using regular digits, not Japanese kana. The same holds for Arabic, Hindi, etc. Handling local culture to this regard is well beyond the scope of the Standard C library, it requires language specific software. IMHO, the attempts at supporting local encodings in `` were a bad idea. They were never able to deal with the subtleties of different languages, and as far as I know are pretty much obsolete now. – chqrlie Jan 26 '16 at 22:37
  • @chqrlie: Re-added rounding and a note. I really had no idea about platforms _that_ exotic (with 18 bit `size_t` it works again). But as I use `CHAR_BIT` already, it should be respected. – too honest for this site Jan 26 '16 at 23:13
  • @chqrlie: If a `char` is 2 octets, `sizeof(char)` still yields `1`. `CHAR_BIT` would be `16` then, i.e. 16 bit bytes. – too honest for this site Jan 26 '16 at 23:15
  • @Olaf: let me reassure you: *DeathStation 9000 (often abbreviated DS9K) is a fictional computer architecture often used as part of a discussion about the portability of computer code (often C code). It is imagined to be as obstructive and unhelpful as possible, whilst still conforming to any relevant standards, deliberately acting unexpectedly whenever possible.* The article on wikipedia was deleted https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/DeathStation_9000 and the github project https://github.com/nitrix/ds9k is not enticing. – chqrlie Jan 26 '16 at 23:33
  • @Ctx: If a char is 2 octets, sizeof(char) still yields 1. CHAR_BIT would be 16 then, i.e. 16 bit bytes, so it still was a single `char`. – too honest for this site Jan 26 '16 at 23:53
  • @chqrlie: sorry, wrong addressee (darn auto-completion). Thanks for the info about DS9K; that was new to me. I like the idea, though, it apparently works (fun-fact: it 13 bit bytes will not cause problems here). Until then I just knew about bit sizes which are multiples of 2. – too honest for this site Jan 26 '16 at 23:58
  • @Olaf: I like the idea of designing a configurable virtual machine with an appropriate C compiler that would embody the concept. I wonder how much software would survive the portability test... along the lines of Doug McIlroy's excellent paper [*A Killer Adversary for Quicksort*](http://www.cs.dartmouth.edu/~doug/mdmspe.pdf), it should be possible to write a tool that can determine how to configure the DS9K to kill any given piece of software. – chqrlie Jan 27 '16 at 00:04
  • @Olaf the question targeted more the case, if printf with %d could possibly emit utf8-digits from some fancy charset, but I think, chqrlie is right that this is currently not standard conform (and probably cannot be without having huge impact on other parts of it). – Ctx Jan 27 '16 at 08:52
  • @Ctx: I don't think OP did ask about multi-character encodings (seriously: where do you see an indicator for such a concern?). Anyway, I agree this is not possible from the standard. For example, how would e.g. `isdigit` handle that (or the other `ctype` functions? Although they take an `int`, it is [interpreted as `unsigned char` or `EOF`](http://port70.net/~nsz/c/c11/n1570.html#7.4p1). – too honest for this site Jan 27 '16 at 13:07
  • Please refrain from answering off-topic questions. Your answer will inhibit the question to roomba and cause manual work to the community to delete. – old_timer Jul 25 '17 at 17:21
  • @old_timer: Took you quite some time to find one:-) This one is not that **obviously** OT, I did not VtC it (as you did on the question I dropped the comment at). I might have refrained from answering if I had found the dupe myself. (btw: this question will not roomba anyway). – too honest for this site Jul 25 '17 at 17:37
  • @Olaf although OT based the DMA question will continue to come back over and over again...there are countless not-deleted or community questions on the site just like that...Being completely hands off wont necessarily push an OT question off the site, so participation can help shed a light on things...I do not have a problem with trying to encourage OT questions off the site... – old_timer Jul 25 '17 at 19:10
3

The exact answer, accounting for space for a string terminator, would be

 log10(SIZE_MAX) + 2

where the SIZE_MAX macro is declared in stdint.h. Unfortunately, that's not a compile-time constant (on account of the use of log10()). If you need a compile-time constant that is computed at compile time then you could use this:

sizeof(size_t) * CHAR_BIT * 3 / 10 + 2

That gives the correct answer for size_t up to at least 256 bits. It's based on the fact that 210 is pretty close to 1000. At some large number of bits it will be too small by one, and for much larger numbers of bits it will fall further behind. If you're worried about such large size_t then you could add one or even two to the result.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • To be a bit nit-picky about the "exact": if `log10(SIZE_MAX)` is already an integer, you allocate one entry too much ;-) – too honest for this site Jan 26 '16 at 17:56
  • My commented answer was wrong: `ceil` would fail in the case of `SIZE_MAX == 1000` (say). – Weather Vane Jan 26 '16 at 17:58
  • Given that the `*_MAX` macros to represent the maximum representable values of their corresponding integer types, and given that C's rules for the representation of integers require an equivalent of binary representation, `SIZE_MAX` must always be one less than some power of 2. That can never be a power of 10. – John Bollinger Jan 26 '16 at 18:02
  • @Olaf, in any case, if `log10(SIZE_MAX)` *were* exactly an integer, the computation I present would still be correct. Say `SIZE_MAX` was 1000, so `log10(SIZE_MAX)` is 3. The My formula then evaluates to 5, which is exactly what you need to accommodate the value 1000, which must be supported. – John Bollinger Jan 26 '16 at 18:06
  • @JohnBollinger: Did I mention I don't like logarithms? Just let's blame the full moon ... ;-) Nevertheless, I think we have to answers for OP to choose from. – too honest for this site Jan 26 '16 at 18:19
2

If you can use snprintf, use:

int len = snprintf (NULL, 0, "%zu", strlen (sSomeString));
mikedu95
  • 1,725
  • 2
  • 12
  • 24
0

On systems where a char is represented using 8 bits,

If sizeof(size_t) is 2, then, the maximum value is: 65535
If sizeof(size_t) is 4, then, the maximum value is: 4294967295
If sizeof(size_t) is 8, then, the maximum value is: 18446744073709551615
If sizeof(size_t) is 16, then, the maximum value is: 3.4028237e+38

You can use that information to extract maximum size of the string using the pre-processor.

Sample program:

#include <stdio.h>

#ifdef MAX_STRING_SIZE
#undef MAX_STRING_SIZE
#endif

// MAX_STRING_SIZE is 6 when sizeof(size_t) is 2
// MAX_STRING_SIZE is 11 when sizeof(size_t) is 4
// MAX_STRING_SIZE is 21 when sizeof(size_t) is 8
// MAX_STRING_SIZE is 40 when sizeof(size_t) is 16
// MAX_STRING_SIZE is -1 for all else. It will be an error to use it 
// as the size of an array.

#define MAX_STRING_SIZE (sizeof(size_t) == 2 ? 6 : sizeof(size_t) == 4 ? 11 : sizeof(size_t) == 8 ? 21 : sizeof(size_t) == 16 ? 40 : -1)

int main()
{
   char str[MAX_STRING_SIZE];
   size_t a = 0xFFFFFFFF;

   sprintf(str, "%zu", a);
   printf("%s\n", str);

   a = 0xFFFFFFFFFFFFFFFF;
   sprintf(str, "%zu", a);
   printf("%s\n", str);
}

Output:

4294967295
18446744073709551615

It will be easy to adapt it to systems where a char is represented using 16 bits.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • Your method is complicated and not foolproof: you do not handle 16 bit systems, nor embedded systems with 16 bit chars where sizeof(size_t) may be 4 but is still 64 bits. – chqrlie Jan 26 '16 at 18:03
  • @chqrlie, I am surprised you find it complicated. Good observation about 16 bit systems though. – R Sahu Jan 26 '16 at 18:11
  • This also ignores the possibility of padding bits. – Kerrek SB Jan 26 '16 at 18:20
  • @KerrekSB, I am afraid I understand that term only superficially. Can you point to me to some useful reading material? That'll be appreciated. Thanks. – R Sahu Jan 26 '16 at 18:26
  • @RSahu: Maybe C11 6.2.6.2? – Kerrek SB Jan 26 '16 at 18:30
  • @KerrekSB: RSahu's proposal quietly ignores exotic systems that have weird sizes (3, 5, 6, 7...) or `CHAR_BIT != 8`, but the potential presence of padding bits is not a problem: it would lower the maximum value for a `size_t`. – chqrlie Jan 26 '16 at 20:53