2

I'm curious why when reading less than 4 bytes from a file, the output is corrupted.

Here is my test file:

user@UnixVM:~/labs$ cat foo.txt
helloworld

And a simple program to read from the file:

int main()
{
        int file=0;
        if((file=open("foo.txt",O_RDONLY)) < -1)
                return 1; 
        char buffer[11];        
        read(file,buffer,3);
        printf("%s\n",buffer);
        return 0;
}

The output is corrupted and may be different between executions:

user@UnixVM:~/labs$ gcc -Wall lab1_4.c -o lab1_4 ; ./lab1_4
hel2
user@UnixVM:~/labs$ gcc -Wall lab1_4.c -o lab1_4 ; ./lab1_4
hel▒

But every time I make number of bytes to read greater or equal to 4 (read(file,buffer,4);), it works fine.

  • 1
    You need to terminate strings (to send to `printf()`) with a `'\0'`. Try `char buffer[11] = "";` – pmg Oct 18 '18 at 07:38
  • 3
    Cheap fix -- initialize your buffer, e.g. `char buffer[11] = "";` then don't write more than 10 characters. In C, every function that expects a *string* expects that string to be *nul-terminated*. It is up to you to do the accounting on your character arrays you want to use as a string. If the character following the last valid character in the array isn't `'\0'` -- then it's not a string, it's just an array. Also note that functions meant to fill buffers with strings (e.g. like `fgets`, POSIX `getline` -- ensure the resulting buffer is *nul-terminated*) – David C. Rankin Oct 18 '18 at 07:39

1 Answers1

6

Your output is "corrupted" because buffer does not contain a NUL terminated C string. Read more about undefined behavior. Be scared (UB sometimes appears to work, and that might explain what you experiment).

So before your call to read add memset(buffer, 0, sizeof(buffer)) to clear your buffer. Or initialize it with char buffer[11] =""; (both are nearly equivalent and likely, with optimizations enabled e.g. gcc -O2, to generate the same machine code). Since your buffer is 11 bytes long and you read at most 3 bytes you'll then be sure that it is NUL terminated after the read.

Be sure to compile with all warnings and debug info (so gcc -Wall -Wextra -g lab1_4.c -o lab1_4 in your case). Read How to debug small programs

Read carefully the documentation of read(2) and of every function you are using. Notice the return count from read. You should test and use it.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • personally I'd be adding the NUL based on the return value of `read`, rather than pre-initialising the entire buffer with `\0` before hand. – Alnitak Oct 18 '18 at 07:42
  • `char buffer[11] = "";` is enough.It ensures initialization of the array. First character is a `'\0'` because is is what initializer contains, and rules for aggregate initialization ensure that all remaining bytes are initialized to 0. – Serge Ballesta Oct 18 '18 at 07:44
  • Yes, that might be a useful optimization. In practice, a system call like `read` is always *much more* costly than a `memset` (unless the buffer is huge). – Basile Starynkevitch Oct 18 '18 at 07:44
  • @BasileStarynkevitch why reading more than 4 bytes has no such issues? – usingnamespace Oct 18 '18 at 07:52
  • 1
    @usingnamespace "luck" - you just happened to hit some memory that already had NUL characters in it. – Alnitak Oct 18 '18 at 08:02
  • @BasileStarynkevitch I don't consider that an optimisation, as such. It's more a case of "why bother pre-filling" when only a single NUL is required, and then only because the OP wants to use `printf` to display it. Heck, I've been programming in C for 30+ years but I couldn't be 100% sure that the initialisation via `char buffer[11] = ""` actually guarantees NUL initialisation of the entire buffer. If I wanted that I'd consider a more explicit `char buffer[11] = { '\0', }`. – Alnitak Oct 18 '18 at 08:21
  • Both are doing the same as a `memset` and are initializing the *entire* buffer with zero bytes. – Basile Starynkevitch Oct 18 '18 at 08:22
  • if I write a function `size_t foo() { char buffer[11] = ""; return strlen(buffer); }` and then look at the `clang` assembly output with `-O2` I see the explicit initialisation to NUL, and then the strlen call. Curiously, if I replace the "" with a non-empty constant the compiler just replaces the code with an explicit return of the length of that constant. OTOH, `gcc` will recognise either an empty string or a non-empty string constant and return its length directly, but if I use `char buffer[11] = { 65, }` instead of `char buffer[11] = "A"` it defeats that optimisation. – Alnitak Oct 18 '18 at 08:26
  • Read about the [as-if rule](https://stackoverflow.com/a/15718279/841108) – Basile Starynkevitch Oct 18 '18 at 08:28
  • @BasileStarynkevitch oh, sure, it was just an observation about how the optimisers actually work. The as-if rule ought to allow clang to `return 0` as GCC does when given an empty string, but yet it doesn't. Give it a _non-empty_ string initialiser, though, and it does. – Alnitak Oct 18 '18 at 08:32