Read text and print each (byte) character in separate line

Question

preforming this code to read file and print each character \ (byte) in separate line

works well with ASCII

void
preprocess_file (FILE *fp)

{
  int cc;

    for (;;)
      { 
    cc = getc (fp);
    if (cc == EOF)
        break;
    printf ("%c\n", cc);
      }
}

int
main(int argc, char *argv [])
{
    preprocess_file (stdin);

    exit (0);
}

but when i use it with UTF-8 encoded text it shows unredable character such as

ï
»
؟
ط
§
ظ
„
ظ
…
ط
¤
ط
´
ط

and advice ?

Thanks

I don't know the C libraries well enough to tell you how to fix this, but you should stop assuming that 1 byte == 1 character. In many encodings - including UTF-8 - that's simply not true, at least not for all characters. — Jon Skeet, Nov 09 '13 at 09:24
Possible duplicate: http://stackoverflow.com/questions/2113270 — Orel Eraki, Nov 09 '13 at 09:27

score 0 · Answer 1 · answered Nov 09 '13 at 09:51

0

To be Unicode-aware you need to use fgetwc instead of getc.

Also, for fgetwc to work with UTF-8, you may need to configure your environment to treat UTF-8 as the default character encoding. However, this is the default on modern Linux systems.

answered Nov 09 '13 at 09:51

Robin Green

32,079
16
104
187

Read text and print each (byte) character in separate line

1 Answers1