0

Here is a minimal "working" example:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char* argv[])
{
    int num = 10;

    FILE* fp = fopen("test.txt", "r");     // test.txt contains character sequence

    char* ptr = (char*) malloc(sizeof (char)*(num+1));  // +1 for '\0'

    fread(ptr, sizeof(char), num, fp);      // read bytes from file
    ptr[num] = '\0';

    printf("%s\n", ptr);        // output: ´╗┐abcdefg

    free(ptr);
    fclose(fp);

    return 0;
}

I would like to read some letters from a text file, containing all letters from the alphabet in a single line. I want my array to store the first 10 letters, but the first 3 shown in the output are weird symbols (see the comment at the printf statement).

What am I doing wrong?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Marcel S.
  • 13
  • 2

1 Answers1

3

The issue is that your file is encoded using UTF-8. While UTF-8 is backwards-compatible with ASCII (which is what your code will be using) there are many differences.

In particular, many programs will put a BOM (Byte Order Mark) symbol at the start of the file to indicate which direction the bytes go. If you print the BOM using the default windows code page, you get the two symbols you saw.

Whatever program you used to create your text file was automatically inserting that BOM at the start of the file. Notepad++ is notorious for doing this. Check the save options and make sure to save either as plain ASCII or as UTF-8 without BOM. That will solve your problem.

LambdaBeta
  • 1,479
  • 1
  • 13
  • 25
  • That solved my problem. Never heard about BOM before. I am using notepad++ and your hint and the settings for Notepad++ explained in this [answer](http://stackoverflow.com/questions/8432584/how-to-make-notepad-to-save-text-in-utf-8-without-bom) made it also run on my machine. Thanks! – Marcel S. Dec 08 '16 at 08:10