0

Simple program: reads a name and a surname (John Smith) from a .txt file via fscanf, adds spaces, prints the name in the console (just as it's written in the .txt).

If compiled and ran on Win10 via

Microsoft (R) C/C++ Optimizing Compiler Version 19.14.26433 for x86

the following code does not produce the same output for the same input across different .exe launches (no recompiling). For each input it seems to have multiple outputs avaialble, between which the program decides at random.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() 
{
    char input_file_name[255];
    FILE * input_file;
    char name[255];
    input_file = fopen ("a.txt","r");
    do
    {       
        if (strlen(name) != 0 )
            name[strlen(name)] = ' ';

        fscanf (input_file, "%s", name + strlen(name) * sizeof(char));
    }while(!feof(input_file));
    fclose (input_file);
    printf("Name:%s\n", name);
    system("pause");
    return 0;
}

I will list a couple of inputs and outputs for them. As not all characters are printable, I will type type them as \ascii_code instead, such as \97 = a. The most common anomalies are \31 (Unit Separator) added at the very front of the string and \12 (NP form feed, new page) or \17 (device control 1) right before the surname (after the first space).

  1. For "John Smith":

    • "John Smith" (proper output)
    • "\31 John Smith"
  2. For "Atoroco Coco"

    • "Atoroco \12Coco"
    • "\31 Atoroco \16Coco"
  3. For "Mickey Mouse"

    • "Mickey Mouse" (proper)
    • "\31 Mickey\81Mouse" (There is a \32 (space) in the string right before the \81, but the console doesn't show the space?!)

If compiled a different machine (MacOS, compiler unknown) it seems to work properly each time, that is it prints simply the .txt's contents.

Why are there multiple outputs produced, seemingly at random? Why are these characters (\31, \12 etc) in particular added, and no other?

John Smith
  • 3,863
  • 3
  • 24
  • 42
  • 1
    Turn on your compiler warnings, amd **mind them**! (Hint: `name` is used without a valid assignment (or initialization)). – pmg Nov 28 '18 at 11:47
  • You must (try to) read before you can call `feof()`. Fix this one first. – Alexey Frunze Nov 28 '18 at 11:47
  • What's in `name[]` on the first iteration through the loop? – Alexey Frunze Nov 28 '18 at 11:58
  • @AlexeyFrunze On the first iteration for "Mickey Mouse", name[] is either "\31 Mickey" or "Mickey" – John Smith Nov 28 '18 at 12:06
  • @pmg `name[0] = '\0';` seems to have fixed it, or at least it produced the right output for 20+ times in a row. I did not think I have to set any of its elements to anything - I thought it's some random garbage that I'll be overwriting anyway. Where can I read more about it? Why what was happening was actually happening? – John Smith Nov 28 '18 at 12:06
  • Only `static` and global variables are automatically initialized (and you should avoid them while you're learning C). You have to initialize (or assign a value later but before you use the variable) all other kinds of variables. – pmg Nov 28 '18 at 12:10

1 Answers1

1

Your code invokes Undefined Behavior (UB), since it uses name uninitialized. Read more in What is Undefined Behaviour in C?

We will initialize it, and make sure the null terminator is there. Standard string functions, like strlen(), depend on the null terminator to mark the end of the string.

Then, you need to make sure that you read something before you call feof(). Moreover, it's a good idea to check what fscanf() returns, which denotes the number of items read.

Putting all together, we get:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() 
{
    char input_file_name[255];
    FILE * input_file;
    char name[255] = "\0"; // initialize it so that is has a  null terminator
    input_file = fopen ("a.txt","r");
    do
    {       
        if (strlen(name) != 0 )
            name[strlen(name)] = ' ';
    } while (fscanf (input_file, "%s ",  name + strlen(name) * sizeof(char)) == 1 && !feof(input_file));
    fclose (input_file);
    printf("Name:%s\n", name);
    return 0;
}

Output (for "georgios samaras"):

georgios samaras

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Do you know why the added characters were persistently `\31` `\12` and so on, over and over again? Set aside the fact that's UB, any guesses why were they added at all? Even if I knew I have to initialize first, I'd think in this case I'm simply using a block of memory that is not assigned to anything yet and can be overwritten by anything at any time, but why specifically those characters? – John Smith Nov 28 '18 at 12:12
  • Garbage values that were already there @JohnSmith, I *guess*, which is what happens with UB! – gsamaras Nov 28 '18 at 12:13
  • Does `char name[255] = (char *)malloc(255*sizeof(char));` do the trick as well? That is, does initializing an array via setting one of its elements to '\0' automatically call `malloc()`? – John Smith Nov 28 '18 at 18:24
  • No @JohnSmith this is very wrong! Malloc is used to dynamically allocate memory. In your case, name array has already memory for it, since it's an array of fixed size, hope that helps.. – gsamaras Nov 28 '18 at 18:50