1

I initially wrote a perfectly working C code for converting a positive integer (say 12345) to a string (i.e. "12345") and print it. The logic is simple: extracting the digits of the integer one-by-one by taking modulo 10 and reducing the integer by a factor of 10 each time (this occurs in the itos() function). The final string needs to hold the extracted digits in reverse order so as to match the original integer (cf. reverse() function).

#include <stdio.h>
#include <string.h>

char *reverse (char *s);
char *itos (int n);

int
main ()
{
  int n = 12345;
  char *s = itos (n);
  printf ("%s", s);
  return 1;
}

char *
itos (int n)
{
  int d = 0;
  char s[100];
  int i = 0;
  do
    {
      d = n % 10;
      s[i] = (d + '0');
      n = n / 10;
      i++;
    } while (n != 0);
  s[i] = '\0';
  return reverse(s);
}

char *
reverse (char *s)
{
  int i = 0;
  int j = strlen (s) - 1;
  char temp;
  while (i < j)
    {
      temp = s[i];
      s[i] = s[j];
      s[j] = temp;
      i++;
      j--;
    }
  return s;
}

You may compile it here. This works perfectly fine, but something strange happens if I return s instead of reverse(s) from the char* itos() function. That is:

#include <stdio.h>
#include <string.h>

char *itos (int n);

int
main ()
{
  int n = 12345;
  char *s = itos (n);
  printf ("%s", s);
  return 1;
}

char *
itos (int n)
{
  int d = 0;
  char s[100];
  int i = 0;
  do
    {
      d = n % 10;
      s[i] = (d + '0');
      n = n / 10;
      i++;
    } while (n != 0);
  s[i] = '\0';
  return s;
}

I'd have expected it to simply print the string "54321" instead of "12345". However, in the output screen, I simply get a (null). There might be a logical error in the char* itos() function, as in I might have done some illegal pointer operation, but I can't really locate the source of error.

I did try debugging by inserting print statements in several parts of the code. What I noticed is if I print the string s within the itos() function just before the return statement then it works fine and prints the string "54321". However, the print statement in the main() function still outputs a (null), which probably implies that there is some mismanagement of memory when the itos() function returns a character pointer to the main() function. Could someone please clarify?

  • 2
    The both programs have undefined behavior because the functions return pointers to a local array. – Vlad from Moscow Aug 20 '19 at 14:05
  • @VladfromMoscow Interesting, I didn't know that limitation! So, would defining a *global* character array be the way to go, or is there a better alternative which you would suggest? –  Aug 20 '19 at 14:07
  • 1
    @Blue You should pass the arrays as parameters to the function and allocate them on the caller side. This is very basic stuff best answered by a good beginner-level book. – Lundin Aug 20 '19 at 14:13

2 Answers2

3

Inside function itos you declared the variable s which is an array of 100 chars. This array is allocated somewhere after the start of the function and deallocated just before the function returns to its caller.

So when you put a printf inside itos, the variable s will still be a valid storage and this is why it worked as you intended. However as soon as itos returns, the memory reserved to the variable s should be considered deallocated thus can not be expected to still contain the data it had when itos was executing, creating undefined behavior if you rely on that.

Probably once deallocated, the memory pointed by s had its values modified, causing your function to not being able to proper convert the string back to a number.

andresantacruz
  • 1,676
  • 10
  • 17
  • Thanks, gotcha! So what would be the correct way to go about this program? Declaring a *global* `char s[100]`? –  Aug 20 '19 at 14:10
  • @Blue It's somewhat complicated to state here in a short answer what would be the "correct way to go about this" as it can consider tons of factors related to code quality, execution, etc. – andresantacruz Aug 20 '19 at 14:12
  • Although yes, declaring `s` as a global variable would fix this specific problem you are facing. – andresantacruz Aug 20 '19 at 14:13
  • You should avoid teaching people the rules of automatic storage using a stack frame. While automatic objects are ubiquitously implemented with a stack, the C standard does not construct its rules with such a concept, and teaching the wrong model can lead to incorrect deductions. (For example, if automatic storage **is** a stack, then objects on the stack **persist** after a function returns as long as space down to those objects is not yet reused for other purposes. But that deduction is false because the implementation is not purely a stack—it is a stack plus optimization during compilation… – Eric Postpischil Aug 20 '19 at 14:55
  • … and that optimization may mean objects are not really on the stack in the portrayed way). – Eric Postpischil Aug 20 '19 at 14:56
  • The code style is not a good way to mark phrases like “stack frame” as special terminology. Code style is to mark source code or other computer input/output/command text. To mark phrases as special, it suffices to use italics. – Eric Postpischil Aug 20 '19 at 14:57
  • @EricPostpischil agreed about my usage of code style markers. As for your argument about avoiding teaching people the rules of automatic storage I didn't understand what is the point you were trying to do. – andresantacruz Aug 20 '19 at 15:04
  • I just explained how stack works in intel-based architectures as a background to answer **why** OP was facing his issues with his code. – andresantacruz Aug 20 '19 at 15:05
  • Even if you consider cases where optimizations will store locals in registers or any other place, I can't see where my argument fails. It would be nice if you could elaborate a bit more about this. – andresantacruz Aug 20 '19 at 15:07
  • @dedecos: The OP did not say they were working on Intel-based architectures. They asked about a C program, and you answered using things that are not part of the C standard. The proper way to teach is to use the C standard’s rules about automatic storage duration, not to use stacks. – Eric Postpischil Aug 20 '19 at 15:09
  • @EricPostpischil as I made it crystal clear what I said was related to intel-based architectures I can't see the downside of explaining how that works in a intel-based architecture which is the architecture used in the vast majority of the systems. I will try to be more architecture-agnostic in my future answers though - I accept what you say as a valid point. – andresantacruz Aug 20 '19 at 15:18
  • If we were assured functions used a stack for their automatic objects, then we could rely on the code `char *bar(void) { char x[] = "abc"; return x; } void foo(void) { char string[4]; char *p = bar(); for (int i = 0; i < 4; ++i) string[i] = p[i]; puts(string); }` to print “abc”, because “abc” is put on the stack by `bar` and is not overwritten prior to its copying by `foo`. But, by the rules of the C standard, the behavior of this code is not defined by the C standard. Therefore, saying automatic objects are implemented by a stack is not a correct representation of the rules of C. – Eric Postpischil Aug 20 '19 at 15:21
  • In fact, when I compile and execute the above code (with an include of `` and a `main` that calls `foo`) using Apple LLVM 10.0.1 with clang-10001.0.46.4 on macOS 10.14.6, it does not print “abc”. Looking at the generated code shows that the data for `x` is in fact not put on the stack. And this is an Intel architecture. So the assertion that objects “local” to functions are written to the stack is not generally true. – Eric Postpischil Aug 20 '19 at 15:27
  • @EricPostpischil okay, so your problem with my answer is that I have stated that the variable `s` will (thus implying it always will) be allocated on the stack? In the first comments I thought you were arguing about I teaching inner details of the stack. – andresantacruz Aug 20 '19 at 15:27
  • @EricPostpischil I just updated the answer, please review it if you can. Thanks in advance. – andresantacruz Aug 20 '19 at 15:52
0

char s[100] is only valid within itos. When the function returns, the memory becomes invalid, and accessing it through the pointer you returned is undefined behavior. You could instead allocate the memory dynamically, like this:

char* s = malloc(100);

When you do it this way, you need to free it when you're done with it (such as after the printf in main), like this:

free(s);

So, would defining a global character array be the way to go?

This would also work. Generally, it has its advantages and disadvantages over dynamically allocating the memory. For instance, it would occupy the memory at all times even if your function isn't using it. But to make up for it, the function runs faster because it doesn't have to dynamically allocate/delete the memory every time the function runs. But on the other hand it increases the startup time because it will zero-initialize the array. And of course you couldn't run it multiple times in parallel with one global array. But for this small program, none of those matter much.

Blaze
  • 16,736
  • 2
  • 25
  • 44
  • -1 for "But to make up for it, the function runs faster because it doesn't have to dynamically allocate/delete the memory every time the function runs.". There will be no difference in allocation time of the stack using `s` as local or not. – andresantacruz Aug 20 '19 at 14:17
  • @dedecos Using a global array once isn't faster than using `malloc`/`free` every time the function runs? – Blaze Aug 20 '19 at 14:24
  • yeahs, it would be. However `s` is allocated in the `stack frame` as a local of `itos`. This is simply a matter of modifying the value that will already subtract the stack register to include the size of `s`. – andresantacruz Aug 20 '19 at 14:26
  • @dedecos I'm pointing out this speed difference as a comparison to using dynamic allocation as detailed in my answer, not compared to the wrong code that has undefined behavior. Sorry for being unclear, I'll change the wording a bit to make it less confusing. – Blaze Aug 20 '19 at 14:29
  • 1
    I didn't noticed you were comparing dynamic allocating `s` to use it as a global variable - thought you were saying that allocating `s` locally would have any impact on the startup of the function. Sorry for the misunderstanding! – andresantacruz Aug 20 '19 at 14:30