1

So I'm trying to read in from stdin a byte at a time. Each iteration of the while loop I am trying to reallocate the buffer, but I don't want to use realloc. This is what I've tried:

   char tempChar = '\0';
   char *buffer;
   int bufferSize = 0;


   buffer = (char*) malloc(sizeof(char));
   while((tempChar = getc(stdin)) != EOF)
   {
      buffer[bufferSize] = tempChar;
      bufferSize++;

      char *temp = buffer;
      buffer = (char*)malloc(sizeof(char)*bufferSize);
      memcpy(buffer, temp, sizeof(temp));
      free(temp);
   }
   buffer[bufferSize] = '\0';

I get a segmentation fault. Any idea why that happens?

EDIT: Ok I fixed the two bugs like other people have said. Here is the fixed version:

  char tempChar = '\0';
  char *buffer;
  int bufferSize = 1;
  int count = 0; 
  buffer = malloc(sizeof(char));

  while((tempChar = getc(stdin)) != EOF){
    buffer[count] = tempChar;
    count++;

    if(count >= bufferSize){
      bufferSize *= 2;
      char *temp = buffer;
      buffer = malloc(sizeof(char)*bufferSize);
      memcpy(buffer, temp, count);
      free(temp); 
    }
  }

  buffer[count - 1] = '\0';
klamse
  • 53
  • 1
  • 6
  • 6
    Why don't you want to use `realloc`? – Dai Oct 11 '15 at 02:38
  • You're off by one - after you read one character, you create a new buffer - of size **one**. And your `memcpy()` is wrong. – Andrew Henle Oct 11 '15 at 02:39
  • Just from a practical standpoint, I cannot think of a less efficient way to allocate memory. You should at least allocate some reasonable number of bytes to begin with and then increase be some reasonable amount each time it is needed. Why not `32` or `64` or `256`, etc... – David C. Rankin Oct 11 '15 at 02:47
  • 2
    `sizeof(temp)` is a constant; probably 4 or 8. – 5gon12eder Oct 11 '15 at 02:47
  • You should also use [`fgets`](http://en.cppreference.com/w/c/io/fgets) to read in a whole line at a time or – if your input is binary – [`fread`](http://en.cppreference.com/w/c/io/fread). Not only will it be much simpler but also *a lot* faster. – 5gon12eder Oct 11 '15 at 02:51
  • That's where size depends. There is no advantage to `fgets` unless the number of characters exceeds ~32 chars. In fact if you read at least 4-bytes at a time there is very little difference even for strings as long as 800,000 characters. The biggest speed disadvantage here is the repeated `malloc` and `free` taking place. – David C. Rankin Oct 11 '15 at 02:57
  • @5gon12eder - The code is reading user input from `stdin`. The code as posted is still likely to perform somewhat faster than the user can type. – Andrew Henle Oct 11 '15 at 03:00
  • You can redirect a file of any size to `stdin`. (including the 800,000 char test file). Another issue is the allocations themselves `buffer = malloc (sizeof *buffer);` and `buffer = malloc (sizeof *buffer * bufferSize);` will suffice and are less prone to mask errors due to an incorrect cast. – David C. Rankin Oct 11 '15 at 03:02
  • [Don't cast `malloc` in C](http://stackoverflow.com/q/605845/995714) – phuclv Oct 11 '15 at 03:27
  • What's wrong with casting malloc? – klamse Oct 11 '15 at 04:02
  • 1
    @DavidC.Rankin: Where did you get that figure 32? I would expect `fgets` to be faster almost immediately. The overhead of `fgets` should not be any greater than OP's loop, and the cost per character should be significantly lower (on the order of 1 cycle per character versus a whole function call per character). – R.. GitHub STOP HELPING ICE Oct 11 '15 at 04:29
  • @klamse http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc – PC Luddite Oct 11 '15 at 04:37
  • @R.. I tested with different size reads on that 800,000 char string. Comparing `fgetc_unlocked`, `fgets`, and `getline`. There were significant differences between single character read and read times for `fgets`, and `getline`. But for reads of 4-bytes or greater the differences all but disappeared. – David C. Rankin Oct 11 '15 at 08:15

4 Answers4

5

Your buffer is too small by one byte. This line

  buffer = (char*)malloc(sizeof(char)*bufferSize);

should read

  buffer = malloc(bufferSize + 1);

Don't cast the return from malloc() in C, and sizeof(char) is one by definition.

Also, this is wrong:

  memcpy(buffer, temp, sizeof(temp));

That copies a number of bytes equal to the size of a char *.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • Just want to point out that `sizeof(char)` is not one by definition. The size of a char is purely implementation-defined. If I write an architecture that says chars are 5 bytes... well then `sizeof(char)` is five. Though, it's true that for all intents and purposes, a char will almost always be one byte. – fvgs Oct 11 '15 at 02:48
  • 3
    @hexturtle No, `sizeof(char) == 1`. Always. Also on your hypothetical machine. If your `char` is five octets and your `int` is ten octets, then `sizeof(char) == 1` and `sizeof(int) == 2`. – 5gon12eder Oct 11 '15 at 02:54
  • 2
    @hexturtle Per the C standard ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf ) **6.5.3.4 The sizeof operator**: "When applied to an operand that has type `char`, `unsigned char`, or `signed char`, (or a qualified version thereof) the result is 1." – Andrew Henle Oct 11 '15 at 02:54
  • Not all machines and architectures are C Standard-compliant. My point is a pedantic one, but nonetheless, if you want to malloc space for n number of chars, then it is preferable to write `sizeof(char) * n` rather than assume a char is one byte. Again, for almost all practical purposes, a char will be one byte. Most people probably make this assumption in their code. However, if I run your code on a machine with an oh so slightly different architecture where chars are not defined to be a single byte, then your code will not work as expected. – fvgs Oct 11 '15 at 03:02
  • 4
    @hexturtle If you write code in C, you are not programming the hardware but the abstract machine specified by the C standard. And that standard is very precise about `sizeof(char)`. It is the compiler's job to convert your program for the abstract machine into a binary the real hardware can execute and still behaves exactly as the standard mandates. If it doesn't, then your C implementation is broken. The question is tagged "C" so we may assume a conforming C implementation. – 5gon12eder Oct 11 '15 at 03:17
  • Fair enough; from the C programmer's perspective, it is safe to assume a char has a size of one. What one means on the hardware side may still vary with the implementation. But for the C programmer, that distinction doesn't really factor in. This post was interesting: http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1 – fvgs Oct 11 '15 at 03:39
  • @hexturtle: This question is tagged "c" not "hexturtle's hypothetical fake C language that has `sizeof` defined wrong". :-) – R.. GitHub STOP HELPING ICE Oct 11 '15 at 04:26
  • It's my mistake. I was conflating the idea of how a char is physically represented and defined by the architecture with C's abstraction for what a char is. – fvgs Oct 11 '15 at 04:42
3

There are two reasons you would get a segmentation fault here. Before going into that, I have to mention that allocating a new buffer each time you read in an additional byte is tremendously inefficient. It is almost always preferable to allocate some reasonably sized buffer and then expand it in reasonably sized chunks.

That said, the first problem in your code is memcpy(buffer, temp, sizeof(temp)). When you do sizeof(temp), you're going to get the number of bytes for a pointer on your system. Probably 4 or 8. What you actually want here is the length of your old buffer. This is a problem initially, since your buffer starts off having a size of 1 byte. You then try to copy a total of 4 or 8 bytes (probably), not all of which are part of your buffer.

Second, when you do buffer[bufferSize] = '\0', you're actually writing the \0 to the byte after the end of your buffer. You would want to use bufferSize - 1 for this.

fvgs
  • 21,412
  • 9
  • 33
  • 48
  • "Second, when you do `buffer[bufferSize] = '\0'`, you're actually writing the `\0` to the byte after the end of your buffer. You would want to use `bufferSize - 1` for this." The way the code is written, when the loop breaks `bufferSize` characters have been read into the array. So `buffer[bufferSize - 1] = '\0'` will truncate the last character read. – Andrew Henle Oct 11 '15 at 02:50
  • 1
    This is very true. You still want to write the `\0` within the bounds of your buffer by using `bufferSize - 1`. However, it will be important to consider what data may already be at that location. As Andrew points out, based on the OP's code, that memory location would contain a char that was read in. – fvgs Oct 11 '15 at 02:53
0

I get a segmentation fault. Any idea why that happens?

Because you're writing beyond allocated memory.

This is what realloc is for.

You don't need to call realloc to grow your buffer every time you read a new byte, instead double the size of the buffer every time it needs to grow until it hits some practical maximum and then inform your user. This is how data-structures like C++'s vector<T> work.

Dai
  • 141,631
  • 28
  • 261
  • 374
  • Who knows why some idiots downvote willy-nilly. I guess it makes them feel more important. There is nothing wrong with the answer simply because it makes a reference, by way of example, to the way vector works. – David C. Rankin Oct 11 '15 at 02:49
0

There are two errors in your code.

1) memcpy(buffer, temp, sizeof(temp)); This line is copying a constant number of bytes each time. sizeof(temp) is not the allocated size of the the array, but the size of the pointer, which is most likely 4 or 8 bytes. So for the first few iterations you are actually writing more than the allocated space, and after that just not copying enough. You want to replace sizeof(temp) with bufferSize, the same value you used to allocate the buffer.

2) At the end you have buffer[bufferSize] = '\0';. This however is writing a byte past the end of allocated buffer. buffer is bufferSize bytes which means its addressable index values are 0 - bufferSize-1. However replacing this line with buffer[bufferSize-1] will replace the last character with a zero terminator, so you'll always lose the last character of input. You instead need to allocate 1 byte more. Personally I'd replace the allocation line with:

buffer = (char*)malloc(bufferSize+1);

waterjuice
  • 829
  • 5
  • 14