0

I keep getting this warning in my code:

warning C4819: The file contains a character that cannot be represented in the current code page (0). Save the file in Unicode format to prevent data loss

I've done what was suggested in this question with advanced save options and saving in unicode without BOM, but that didn't work; I tried "Use Unicode Character Set" in the project properties, but that didn't work either. Why would it not be able to deal with unicode? For reference I have this line which sprints a unicode character:

sprintf(mac, "\x02\x60\x8c%c%c%c", (num >> 16) & 0xff, (num >> 8) & 0xff, num & 0xff);
Raelz
  • 51
  • 1
  • 2
  • Can you provide an example of the file in question? – babu646 Apr 19 '18 at 14:32
  • Here's a link to the file this code is from on github: https://github.com/mamedev/mame/blob/mame0185/src/devices/bus/isa/3c503.cpp I've gotten a lot of those warnings but they were mostly unicode characters in comments, this is the first one directly in code. – Raelz Apr 19 '18 at 14:40
  • Unable to reproduce. 3c503.cpp looks like pure ASCII text, Unix-style LF (0x0A) line endings, and TAB (0x09) indentation. – Eljay Apr 19 '18 at 15:28
  • For this file, it's specifically the "\x8c" in the sprintf line that I posted in the question, since that falls into unicode range and outside of ascii. The character parameters in sprintf as well can possibly be unicode; num is a random number using rand(). Those don't throw warnings though since it's runtime anyway. – Raelz Apr 19 '18 at 15:41
  • 1
    On Windows, a file with no BOM is considered ANSI-encoded, which is locale-dependent. The odd thing is the default code page seems to be 0. On US Windows it should be 1252. Something seems misconfigured. – Mark Tolonen Apr 19 '18 at 15:48
  • Actually most of the other errors say code page 932, which is Shift-JIS, which I did set as my locale long ago, so that makes sense, but I still don't understand why I can't override that with advanced save options; it just gives the same warning after trying to save at UTF-8. **Edit:** Oh, seems with signature is working. Weird though, when i saved it as UTF-8 without signature, it still gave the error as code page 932. Ah well. – Raelz Apr 19 '18 at 16:40
  • Without a BOM signature, most Wnidows editors assume the ANSI locale, in your case, code page 932. There are compiler options to set the source character set and execution character set. See /source-charset and /execution-charset. /utf-8 is a shortcut to force both to UTF-8. Set them in advanced compiler options. – Mark Tolonen Apr 20 '18 at 01:10

1 Answers1

0

\x8c is not a character as far as the file is concerned. It's a series of (ASCII) characters that the compiler will try to interpret into a character at compile time (which is when you're getting the error).

The compiler will note whether the file is in a Unicode encoding based on whether it the BOM. Since you saved it without a BOM, it's being interpreted in your local code page, which is Shift-JIS.

Wikipedia tells me that, in Shift-JIS, \x8c is the first byte of a doublebyte character. I suspect the next byte (\x25, which is the value for '%') is not a valid second byte. So the compiler can't map the \x8c to a Unicode character.

The diagnostic message is misleading, because it was written with the assumption that programmers would encounter this problem by putting actual Unicode code points into their source files without saving them as Unicode files.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175