0

I'm trying to recreate a simple String parser for user input. I can get the input String without issue and determine how many individual Strings are in it to properly give the correct size to malloc. Initially I wasn't finding the number of Strings in stdin and would just realloc my String array to allow space for one more String and then malloc the new String. Now that I have the size, I believe this isn't necessary as I can just provide malloc with the proper size.

(Attempted with calloc(), segfaulted in the same spot)

What I am confused with, is when I leave realloc in my code there is no issue (aside from knowing it will likely adjust my array in an undesirable fashion). But, If I remove realloc, my code segfaults the instant fgets() tries to execute.

This is just where I call the parser, and my #includes. "appserver.h" holds all prototypes and std(lib/bool/io),string/unistd/limits.h

#include "appserver.h"

int main(int argc, char *argv[])
{
  if(argc != 4)
  {
    exit(0);
  }
  struct userInput user = { parseInt(*argv[1]), parseInt(*argv[2]), argv[3] };

  printf("> ");
  char **newArgs = stringParser();
  for (int i = 0; newArgs[i] != NULL; i++)
  {
    free(newArgs[i]);
  }
  free(newArgs);
}

Below is my code for the parser. If I leave the realloc uncommented I can step through my code without issue (stepping through in gdb shows "input" with the whole string from stdin) but, if I comment it out I segfault right as I try to retrieve user input in fgets(). Why does commenting out realloc() even have an effect on fgets()?

char **stringParser()
{
  char *input;
  fgets(input, INT_MAX, stdin);
  int size = numberOfStrings(input);
  char **inputArray = malloc((size)*(sizeof(char*)));
  inputArray[0] = malloc(sizeof(char*));
  strcpy(inputArray[0], strtok(input, " "));
  for(int i = 1; i < size /*inputArray[i-1] != NULL*/; i++)
  {
    // inputArray = realloc(inputArray, (i+1)*sizeof(char*));
    inputArray[i] = malloc(sizeof(char*));
    strcpy(inputArray[i], strtok(NULL, " "));
    printf("Inside inputArray[%d]: %s\n", i-1, inputArray[i-1]); 
  }
  return inputArray;
}

Here is the code for my numberOfStrings() method as well in the event this might be worth looking into but, I stepped through it with gdb as well and it seems pretty concrete.

int numberOfStrings(char *input)
{
  int count = 0;
  char *tempCopy = malloc(sizeof(char*));
  strcpy(tempCopy, input);
  char* token = strtok(tempCopy, " ");
  while(token != NULL)
  {
    token = strtok(NULL, " ");
    count++;
  }
  free(tempCopy);
  return count;
}


EDIT: I wanted to followup to make sure I've avoided most areas of potential undefined behavior. I didn't change anything in main, so here are my changes in **stringParser()
char **stringParser()
{
  char *input = nextLine(stdin);
  int size = numberOfStrings(input, strlen(input));
  char **inputArray = calloc(size, sizeof(*inputArray));
  char *token = strtok(input, " ");
  inputArray[0] = malloc(strlen(token));
  strcpy(inputArray[0], token);
  for(int i = 1; i < size - 1; i++)
  {
    token = strtok(NULL, " ");
    inputArray[i] = malloc(strlen(token));
    strcpy(inputArray[i], token);
  }
  free(input);
  inputArray[size-1] = (char*)NULL;
  return inputArray;
}   

The main areas of change are:

  • I use calloc() instead of malloc(), although if my understanding is correct then calloc(size, sizeof(*inputArray)) behaves the same (ignoring the '0' or garbage value initialization) as malloc((size)*sizeof(*inputArray))
  • A friend of mine wrote out a *nextLine(FILE *input) which allows me to scan, character by character, through stdin until I reach either EOF or '\n'. This has the benefit of allocating exactly the amount of memory I need for the user input.
  • I'm ensuring that each String inside my array is only allocated the necessary amount of memory. (As opposed to sizeof(char*) as I had done before)
  • I've made sure to include a null terminating value at the end of my String array. (I made one change in numberOfStrings() where I just added count++ before the return statement.

These changes have just brought me to a point where I no longer have any memory leaks. I dare not assume I've avoided all acts of undefined behavior but, I imagine I'm doing a little better at this point. I'll be sure not to clog up this post with a bunch of other questions but wanted to leave this update here in the event someone might find it useful or relevant.

  • 1
    Please explain the line `char *tempCopy = malloc(sizeof(char*));` – M.M Oct 19 '17 at 04:23
  • 2
    `fgets(input, INT_MAX, stdin);` : `input` is uninitialized. – BLUEPIXY Oct 19 '17 at 04:24
  • @M.M I just don't like putting raw integer values, and have not implemented a way to allocate the exact length of the string yet. Or if there is any easy built in way to do it, I have not discovered it yet. Out of ignorance I have just used sizeof(char*) as it has not given me any trouble yet. – Drewzillawood Oct 19 '17 at 04:27
  • Allocating insufficient space and then writing past the end of that space would cause your problem. It is giving you trouble now as evinced by this question. – M.M Oct 19 '17 at 04:28
  • I suppose my confusion stems from why that issue doesn't occur if I just leave realloc() in? As fgets() occurs well before the realloc() or even malloc() happens. – Drewzillawood Oct 19 '17 at 04:31
  • You're just randomly overwriting memory you don't own. Bad things may or may not happen depending on what other use that memory was being put to; and it's somewhat unreliable to make exact predictions – M.M Oct 19 '17 at 04:59
  • @M.M Thank you for the clarification. – Drewzillawood Oct 19 '17 at 05:04
  • Consider using [getline(3)](http://man7.org/linux/man-pages/man3/getline.3.html) like [here](https://stackoverflow.com/a/9171511/841108) – Basile Starynkevitch Oct 19 '17 at 07:55

1 Answers1

1

The call of realloc() is not affecting fgets() - at least, not directly, predictably, or reliably.

The first call of fgets(input, INT_MAX, stdin) in stringParser() has undefined behaviour because the pointer input is uninitialised.

Practically, the call of fgets() is probably overwriting some area of memory that it shouldn't.

By adding (or commenting out) the call of realloc(), the result will be some adjustment of the layout of memory used by your program (e.g. changing whether code or data is located at the memory location(s) being overwritten by fgets()).

But this is not because realloc() directly affects fgets(). It is simply a side-effect of the fact that calling realloc() changes something in your program.

The effect of commenting out the realloc() call or reinserting it could be anything. For example, you might get different effects if your code is built using different compiler settings (like optimisation flags) or even a different compiler.

To eliminate the problem with fgets() pass it an array

  char input[some_positive_value];
  fgets(input, sizeof(input), stdin);

or initialise the pointer to point at a suitable buffer.

  char *input = malloc(some_positive_value);   /*  remember to `free()` when done */
  fgets(input, some_positive_value, stdin);

Note that the loop, as described, does NEED the realloc() call

 // inputArray = realloc(inputArray, (i+1)*sizeof(char*));
 inputArray[i] = malloc(sizeof(char*));

without the realloc() call, the assignment to inputArray[i] will ALSO have undefined behaviour. If you haven't seen symptoms of that, you just got lucky - again, it is an affect that you can't rely on.

It would also be a good idea to check that various functions (fgets(), realloc(), etc) actually succeeds. Your code is proceeding based on an ASSUMPTION that the functions all work as intended, but the behaviour will be undefined if they fail (e.g. if memory reallocation by realloc() fails).

Peter
  • 35,646
  • 4
  • 32
  • 74