0

preforming this code to read file and print each character \ (byte) in separate line

works well with ASCII

void
preprocess_file (FILE *fp)

{
  int cc;

    for (;;)
      { 
    cc = getc (fp);
    if (cc == EOF)
        break;
    printf ("%c\n", cc);
      }
}

int
main(int argc, char *argv [])
{
    preprocess_file (stdin);

    exit (0);
}

but when i use it with UTF-8 encoded text it shows unredable character such as

ï
»
؟
ط
§
ظ
„
ظ
…
ط
¤
ط
´
ط

and advice ?

Thanks

user1200219
  • 179
  • 1
  • 4
  • 1
    I don't know the C libraries well enough to tell you how to fix this, but you should stop assuming that 1 byte == 1 character. In many encodings - including UTF-8 - that's simply not true, at least not for all characters. – Jon Skeet Nov 09 '13 at 09:24
  • Possible duplicate: http://stackoverflow.com/questions/2113270 – Orel Eraki Nov 09 '13 at 09:27

1 Answers1

0

To be Unicode-aware you need to use fgetwc instead of getc.

Also, for fgetwc to work with UTF-8, you may need to configure your environment to treat UTF-8 as the default character encoding. However, this is the default on modern Linux systems.

Robin Green
  • 32,079
  • 16
  • 104
  • 187