52

When converting an int like so:

char a[256];
sprintf(a, "%d", 132);

what's the best way to determine how large a should be? I assume manually setting it is fine (as I've seen it used everywhere), but how large should it be? What's the largest int value possible on a 32 bit system, and is there some tricky way of determining that on the fly?

idmean
  • 14,540
  • 9
  • 54
  • 83
Dominic Bou-Samra
  • 14,799
  • 26
  • 100
  • 156
  • Of course, if using C++ is an option instead then you can just use std::string and std::stringstream to accomplish what you want without even thinking about memory requirements. But that really depends. I know that the question is for C but maybe this might be useful anyway. – Robert Massaioli Oct 13 '10 at 00:33
  • 8
    @Robert: if using Python is an option instead, then you can use `str` ;-p – Steve Jessop Oct 13 '10 at 00:35
  • @Steve I see your point, just wanted to see if the poster had just jumped onto C when they did not have to. – Robert Massaioli Oct 13 '10 at 00:42
  • 3
    @Robert: It's for a uni assignment. I hated C prior to this unit. Now I love the simplicity. It's unforgiving, but very satisfying, with a big learning curve given I usually dabble in Python/managed langs. – Dominic Bou-Samra Oct 13 '10 at 01:11
  • @Dominic Don't get me wrong. I love C because it lets you get really close to the computer but there is a time and a place for C development. It is excellent for teaching purposes and everyone should know it if they want to be a Computer Scientist. However, now I only use it now when I want to do microcontroller development. I would argue that if you are having to think about this sort of issue then it is the wrong tool for the job. If I were you, for a uni assignment, take the path that ends up with the least bugs, so make it bigger than you need and just dont worry; premature optimisation. – Robert Massaioli Oct 13 '10 at 01:32
  • Oh BTW, if you want to know the maximum size of an int then: http://www.cplusplus.com/reference/clibrary/climits/. That is present in C and C++. – Robert Massaioli Oct 13 '10 at 01:33
  • 3
    @Robert: I would argye that if you are having to think about this sort of issue, no language other than C or assembly could possibly meet your requirements. Any other language will have monstrous difficult-to-predict stack usage, heap fragmentation, etc. – R.. GitHub STOP HELPING ICE Oct 13 '10 at 02:19
  • @Robert. Not sure how CS units are in the US, but in Aus, we are given very strict requirements. The whole unit is C based, so we must use C. – Dominic Bou-Samra Oct 13 '10 at 02:37
  • 4
    In the GNU world you have `asprintf`, which will internally `malloc` the needed amount of memory. – utopianheaven Apr 01 '13 at 20:39
  • @utopianheaven This is a revelation! Why didn't I know about this before? :D – cat Sep 27 '16 at 21:52
  • related: http://stackoverflow.com/questions/3774417/sprintf-with-automatic-memory-allocation – Ciro Santilli OurBigBook.com Mar 19 '17 at 07:04

7 Answers7

109

Some here are arguing that this approach is overkill, and for converting ints to strings I might be more inclined to agree. But when a reasonable bound for string size cannot be found, I have seen this approach used and have used it myself.

int size = snprintf(NULL, 0, "%d", 132);
char * a = malloc(size + 1);
sprintf(a, "%d", 132);

I'll break down what's going on here.

  1. On the first line, we want to determine how many characters we need. The first 2 arguments to snprintf tell it that I want to write 0 characters of the result to NULL. When we do this, snprintf won't actually write any characters anywhere, it will simply return the number of characters that would have been written. This is what we wanted.
  2. On the second line, we are dynamically allocating memory to a char pointer. Make sure and add 1 to the required size (for the trailing \0 terminating character).
  3. Now that there is enough memory allocated to the char pointer, we can safely use sprintf to write the integer to the char pointer.

Of course you can make it more concise if you want.

char * a = malloc(snprintf(NULL, 0, "%d", 132) + 1);
sprintf(a, "%d", 132);

Unless this is a "quick and dirty" program, you always want to make sure to free the memory you called with malloc. This is where the dynamic approach gets complicated with C. However, IMHO, if you don't want to be allocating huge char pointers when most of the time you will only be using a very small portion of them, then I don't think this is bad approach.

Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
  • Actually, you could often just use `alloca` instead of `malloc`. And if the resulting code is still too bloaty, make a macro for it. – thejh Aug 19 '13 at 14:31
  • 5
    How portable is `alloca`? It certainly isn't ANSI C. – Daniel Standage Aug 19 '13 at 19:24
  • 4
    Surely the portable C99 alternative to alloca is just to use a variable-length array? `int size = ...; char a[size+1]; sprintf(...`? – Tommy Apr 19 '16 at 12:50
  • 2
    Don't forget to handle the case where malloc() returns NULL, unless you don't care if your program crashes or is insecure. This is especially true if the size depends on input from outside the program. – Jeff Learman Aug 14 '17 at 20:28
  • 1
    Definitely +1, I don't get why this is not the accepted answer. It's exactly what the man page explains. – Marco Bonelli Jan 18 '19 at 15:33
20

It's possible to make Daniel Standage's solution work for any number of arguments by using vsnprintf which is in C++11/C99.

int bufferSize(const char* format, ...) {
    va_list args;
    va_start(args, format);
    int result = vsnprintf(NULL, 0, format, args);
    va_end(args);
    return result + 1; // safe byte for \0
}

As specified in c99 standard, section 7.19.6.12 :

The vsnprintf function returns the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred.

jww
  • 97,681
  • 90
  • 411
  • 885
Regis Portalez
  • 4,675
  • 1
  • 29
  • 41
13

The max possible number of bits in an int is CHAR_BIT * sizeof(int), and a decimal digit is "worth" at least 3 bits, so a loose upper bound on the space required for an arbitrary int is (CHAR_BIT * sizeof(int) / 3) + 3. That +3 is one for the fact that we rounded down when dividing, one for the sign, one for the nul terminator.

If by "on a 32 bit system" you mean that you know int is 32 bits, then you need 12 bytes. 10 for the digits, one for the sign, one for the nul terminator.

In your specific case, where the int to be converted is 132, you need 4 bytes. Badum, tish.

Where fixed-size buffers can be used with a reasonable bound, they are the simpler option. I not-so-humbly submit that the bound above is reasonable (13 bytes instead of 12 for 32 bit int, and 23 bytes instead of 21 for 64 bit int). But for difficult cases, in C99 you could just call snprintf to get the size, then malloc that much. That's overkill for such a simple case as this.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 2
    Using `malloc` for this is ridiculous. It overcomplicates your code by adding a failure case you have to check -- and what do you do if it fails?!? Simply use a correctly sized buffer like you explained how to do. – R.. GitHub STOP HELPING ICE Oct 13 '10 at 02:14
  • @R. Very well, I will expand upon my existing, "a bit overkill for such a simple case as this". – Steve Jessop Oct 13 '10 at 11:05
  • 3
    @R: Using malloc is not ridiculous if the person writing the code is interested in learning or knowing the underpinnings of the language itself. Telling people to just use std::string because all the hard work has already done is ignorant to this point; i.e. wanting to know how things work under the hood so to speak. Maybe the original poster wants to know how std::string does what it does? – Eric Jun 28 '11 at 01:41
  • 4
    @Eric: this question is about C, nothing to do with `std::string`. – Steve Jessop Jun 28 '11 at 09:28
  • 8
    This is dangerous - a classic potential buffer overrun scenario. The exact output of the `printf` family of functions depends on the locale. For example, a locale may set the 'thousands' separator. – Brett Hale Jul 26 '14 at 05:26
  • 2
    @Brett: you had me going for a minute there, but of course `%d` doesn't use the thousands separator. `%'d` would, but that's not the question. – Steve Jessop Jul 26 '14 at 08:54
  • @SteveJessop - you're right about the thousands separator - I completely missed that - but I think the broader point about locale-dependent output still stands. I *want* to be wrong about this though. It's hard to get a complete overview of where locale settings can affect format specifiers. – Brett Hale Jul 26 '14 at 09:15
  • @BrettHale: the lesson I'm taking away is that it isn't simple to determine whether you have a simple case or not :-) In some ways that could be the C programmer's motto. I'm sure it was obvious to me at the time I wrote the answer that locale wasn't relevant, but it wasn't until I'd edited the answer to mention locale, then looked back up the question and spotted the format was just `%d`, that I realised it *was* simple after all. – Steve Jessop Jul 26 '14 at 09:29
  • OK, but if we extend this conversation to cover the wide character implementation of sprintf, swprintf, the math for the malloc is further complicated. Since malloc allocates bytes, while snprintf counts characters, the allocation must be doubled for wide characters. This is such an issue that I've reduced it to a set of macros. – David A. Gray Feb 22 '16 at 17:46
  • Reading the question, it's asking how to determine how big the buffer needs to be. It's providing an example using a particular format string and value but what if the format string has `%s`es in it and the string args can be several k each? – dash-tom-bang Jun 08 '17 at 18:52
  • 1
    @dash-tom-bang: I suggest asking another question if you want to know about calculating other upper limits statically. I've already said in the answer how to do it dynamically -- call `snprintf` to get the size actually needed for the args you have. That works just as well for string args as it does for the questioner's `int`. – Steve Jessop Jun 20 '17 at 11:21
7

I see this conversation is a couple of years old, but I found it while trying to find an answer for MS VC++ where snprintf cannot be used to find the size. I'll post the answer I finally found in case it helps anyone else:

VC++ has the function _scprintf specifically to find the number of characters needed.

David Ruhmann
  • 11,064
  • 4
  • 37
  • 47
HopeItHelps
  • 87
  • 1
  • 1
2

If you're printing a simple integer, and nothing else, there's a much simpler way to determine output buffer size. At least computationally simpler, the code is a little obtuse:

char *text;
text = malloc(val ? (int)log10((double)abs(val)) + (val < 0) + 2 : 2);

log10(value) returns the number of digits (minus one) required to store a positive nonzero value in base 10. It goes a little off the rails for numbers less than one, so we specify abs(), and code special logic for zero (the ternary operator, test ? truecase : falsecase). Add one for the space to store the sign of a negative number (val < 0), one to make up the difference from log10, and another one for the null terminator (for a grand total of 2), and you've just calculated the exact amount of storage space required for a given number, without calling snprintf() or equivalents twice to get the job done. Plus it uses less memory, generally, than the INT_MAX will require.

If you need to print a lot of numbers very quickly, though, do bother allocating the INT_MAX buffer and then printing to that repeatedly instead. Less memory thrashing is better.

Also note that you may not actually need the (double) as opposed to a (float). I didn't bother checking. Casting back and forth like that may also be a problem. YMMV on all that. Works great for me, though.

  • 1
    Let us say that your integer is `INT_MIN`. Then `abs(val)` is `INT_MIN`, `log10` applied to it returns NaN, and converting NaN to `int` is undefined behavior. Besides, if you are printing 64-bit integers, the largest of them are not representable exactly as `double`. – Pascal Cuoq Jun 13 '14 at 17:08
  • @Pascal Cuoq -- Interesting quirk, thanks for that. I guess the solution isn't really complete without checking if we're trying to allocate storage for the string representation of INT_MIN, but I'm feeling lazy, so I'll leave that as an exercise for anyone who actually cares to do it. -- P.S. "undefined behavior" is an understatement. In GCC, printf("%d\n", (int)log10((double)abs(-2147483647 - 1))); spits out "-2147483648" instead of the expected "9" -- how odd. – Pegasus Epsilon Jan 16 '15 at 14:02
1

First off, sprintf is the devil. If anything, use snprintf, or else you risk trashing memory and crashing your app.

As for the buffer size, it's like all other buffers - as small as possible, as big as necessary. In your case, you have a signed integer, so take the largest possible size, and feel free to add a little bit of safety padding. There is no "standard size".

It's also not dependent on what system you're running on. If you define the buffer on the stack (like in your example), it depends on the size of the stack. If you created the thread yourself, then you determined the stack size yourself, so you know the limits. If you are going to expect recursion or a deep stack trace, then you need to extra careful as well.

EboMike
  • 76,846
  • 14
  • 164
  • 167
  • Thank you. snprintf it is then. – Dominic Bou-Samra Oct 13 '10 at 00:32
  • 10
    There is no danger to using `sprintf` with a correctly sized buffer. For numeric output, sizing the buffer is easy. See Steve's answer. – R.. GitHub STOP HELPING ICE Oct 13 '10 at 02:17
  • @R.. Assuming another thread doesn't clobber some of the input behind your back... – tc. May 06 '13 at 12:00
  • 3
    @tc.: In that case `snprintf` would not help you. Unsynchronized access to data is **always** undefined behavior. It's equally possible that `snprintf` could overflow if the string size changes out from under it. – R.. GitHub STOP HELPING ICE May 06 '13 at 12:19
  • @R.. While it's not *guaranteed* to help you, I can think of obvious situations and implementations where `snprintf()` would at least stop it from crashing. – tc. May 06 '13 at 12:40
  • Not necessarily. `snprintf` might very well have a TOCTOU race, first checking `strlen` then doing `strcpy` or similar. Unless you've read the implementation source, you don't know. If your program is so broken that it's doing unsynchronized data access (one of the worst forms of UB because it "seems to work" "most of the time") then using `snprintf` here is not going to make a dent in your program's brokenness. – R.. GitHub STOP HELPING ICE May 06 '13 at 12:44
1

Its good that you are worried about buffer size. To apply that thought in code, I would use snprintf

snprintf( a, 256, "%d", 132 );

or

snprintf( a, sizeof( a ), "%d", 132 );  // when a is array, not pointer
Arun
  • 19,750
  • 10
  • 51
  • 60
  • 1
    `snprintf` only solves half the problem. If you're not sure your buffer is big enough, you need to use `snprintf` to test the necessary size or make sure your code does not have bugs when the output gets truncated. Steve's answer is a lot better. – R.. GitHub STOP HELPING ICE Oct 13 '10 at 02:15