2

I'm interested in the encoding of the character in the computer.

When I open my xxx.c with visual studio code, how does the VS code detect the encoding of my file and interprets these "01" sequence. Further on, how the visual studio code (or even the computer system) display the character on the screen acorrding to my "01" sequence file and the character encoding?

Thank you!

I also uses Chinese during my projects. Sometimes, the file encoding really drive my crazy. Sometimes,my correct utf-8 file created by edit A for example, was destroyed by some text editor B that interpret it as GBK file, and edit A can never get it back correct.

I searched a lot, but the most answers seems to be too abstract or irrelevant. I want to figure out how the software and the computer system( or operating system) cooperate together to make this simple but important job done!

jack chan
  • 21
  • 1
  • 6
  • Referencing the vscode [source code](https://github.com/Microsoft/vscode), it uses the jschardet library for guessing. But, by [default](https://code.visualstudio.com/docs/getstarted/settings), it assumes UTF-8. – Tom Blodget Jun 08 '18 at 17:14

1 Answers1

0

First things first, "can never get it back": Always Use Source Code Control

"How the software and the computer system (or operating system) cooperate together to make this simple but important job done!": They don't that's the problem!

Short history: Many decades ago people used small character sets. The idea was a system would always use the same one. Simple. Every time a text file was transferred between systems, it would be immediately transcribed to the local character encoding. Then came the globalization of file exchanges and systems needed to hold text files in different encodings. There was no general way of recording what the encoding was. In 1991 came the huge character set Unicode. Languages (VB4, Java), operating system APIs (Win32), file systems (NTFS), … began adopting it. However, its encodings (UTF-8, UTF-16) are just yet more possibilities for which encoding a text file uses. Many programs that read text files either rely on the old system of a system default encoding or guess ("detect").

In the programming world, some languages require source files to use a specific encoding (say UTF-8); In others, tools default to specific encoding (say UTF-8). In most cases, the toolset provided with a C or C++ implementation will have a consistent set of rules. If you also use an IDE or other form of project system, you can set the encoding for the entire project and in some cases specific files.

So, the only solution is to only use tools that work for you and to properly configure them. If it hurts, stop doing it.


Aside: On the topic of programming and default character encodings, be careful not to get tricked with various language libraries' use of the system default character encoding—unless that is exactly what's needed. Otherwise, you are giving your users the same problem that you are encountering. (In Java, just avoid it with explicit arguments. In C and C++ libraries, encoding is combined into Locales. But note that many systems initialize a program to use default character encoding.

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
  • I know there are ```font``` files in the windows 10. I cannot figure out at what time the operating system will use which ```font file``` and how does these ```font files``` works? Thank you! – jack chan Jun 10 '18 at 01:27
  • That's quite another topic. A font file maps a character to a set of drawing parameters for a glyph, visually represents thing character. A font file may or may not have a glyph for every character you use. Applications almost always use OS-provided facilities for drawing text. A windowing OS provides these because of the impact on overall performance and user experience. In the Win32 GDI API, the low level function is DrawText. However, in most cases, applications use standard UI controls, which call DrawText or equivalent when needed so the application code does not need to explicitly. – Tom Blodget Jun 10 '18 at 19:09