19

It seems Visual Studio 2017 always saves new files as UTF8-BOM. It also seems this was not the case with earlier versions of Visual Studio, but I could not find any documentation.

Also there has been an option "Advanced Save Options\Encoding" which did allow to change the encoding of newly saved files which is missing in VS2017.

Questions:

  • Are all file types saved with UTF8-BOM encoding in VS2017?
  • Is it possible to configure the encoding for new files in VS2017?
  • Will VS2017 change the encoding of "old" files which don't have UTF8-BOM?
  • Is there any documentation about this topic?
Pang
  • 9,564
  • 146
  • 81
  • 122
Manuel
  • 1,985
  • 3
  • 31
  • 51
  • 1
    Possible duplicate of [How to set standard encoding in Visual Studio](https://stackoverflow.com/questions/696627/how-to-set-standard-encoding-in-visual-studio) – Arthur Attout Apr 03 '18 at 06:51
  • 1
    This seems not to apply any more since it targeted VS2008, and the encoding seems to have changed in more recent versions. – Manuel Apr 03 '18 at 07:30
  • 2
    I believe the equivalent of _Advanced Save Options\Encoding_ is now going to _File -> Save ... As_, and on the dialog the _Save_ button is a drop-down button and has a _Save with Encoding_ option. – Grx70 Apr 06 '18 at 07:28
  • 1
    https://stackoverflow.com/questions/43323941/inconsistent-line-endings-visual-studio-community-2017/43324108#43324108 – Andrew Truckle Apr 10 '18 at 14:47

5 Answers5

11

Also there has been an option "Advanced Save Options\Encoding" which did allow to change the encoding of newly saved files which is missing in VS2017.

This feature Already exists! You can save files with specific character encoding to support bi-directional languages. You can also specify an encoding when opening a file, so that Visual Studio displays the file correctly.

save a file with encoding

To save a file with encoding

  1. From the File menu, choose Save File As, and then click the drop-down button next to the Save button. The Advanced Save Options dialog box is displayed.
  2. Under Encoding, select the encoding to use for the file.
  3. Optionally, under Line endings, select the format for end-of-line characters.

Are all files types saved with UTF8-BOM encoding in VS2017

In my case, VS stores all the files with CodePage 1252 encoding.

Is it possible to configure the encoding for new files in VS2017

However, My Visual Studio version is 15.6.1 and some people have the same problem like yours in previous versions of 2017, but they said "We have fixed this issue and it's available in Visual Studio 2017 15.3"

If not working, for C++ projects Take a look at /utf-8 (Set Source and Executable character sets to UTF-8).

Will VS2017 change the encoding of "old" files which don't have UTF8-BOM

By default, Visual Studio detects a byte-order mark to determine if the source file is in an encoded Unicode format, for example, UTF-16 or UTF-8. If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you have specified a code page by using /utf-8 or the /source-charset option. Some people encountered a problem which is came from .editorconfig file, as below:

root = true

[*]
indent_style = tab
indent_size = 4
tab_width = 4
trim_trailing_whitespace = true
insert_final_newline = true
charset = utf-8 

That final charset line is probable doing it... but I'm not asking for 'utf-8-with-bom'!

Oskar
  • 7,945
  • 5
  • 36
  • 44
Amirhossein Mehrvarzi
  • 18,024
  • 7
  • 45
  • 70
  • 1
    My issue was that in VS2019 with an .editorconfig setting of charset = utf-8 the BOM was getting stripped when I saved edited and saved files that previously were utf-8 with bom. The fix was to change the .editorconfig setting to charset = utf-8-bom You kind of elude to this in your answer but I thought it was worth pointing out explicitly since I searched for hours and couldn't find it explicitly documented _anywhere_ on the web. – RonC Jan 27 '21 at 21:39
  • @RonC Yes, the *editorconfig* file overrides some configurations like indentation. – Amirhossein Mehrvarzi Jan 27 '21 at 22:33
  • I think it basically overrides everything listed in it. Which is great, but the difference between charset = utf-8 and charset = utf-8-bom might not jump out to person unless they are aware to look for it. – RonC Jan 27 '21 at 22:36
7

You can use EditorConfig with the charset property to define encoding for source files in VS 2017.

Sergey Vlasov
  • 26,641
  • 3
  • 64
  • 66
  • It seems i can set UTF8 for the charset property, but will the files have the byte order mark(BOM) set in this case as well? – Manuel Apr 04 '18 at 06:01
  • utf-8 value for the charset property means no BOM, utf-8-bom value adds BOM. – Sergey Vlasov Apr 04 '18 at 18:23
  • But utf-8-bom is not listed as possible value for charset – Manuel Apr 05 '18 at 12:46
  • The powers-that-be suggest never using a BOM with utf-8. Just saying. – Jive Dadson Apr 06 '18 at 05:15
  • Why should you avoid using BOM for source files? – Manuel Apr 06 '18 at 07:58
  • Only a small note. I think there was a problem with the EditorConfig *(plug-in)* for Visual Studio 2015 has a problem with the `utf-8-bom` value and another problem; There is no problem for Visual Studio 2017 *(Community)* with the plug-in. Works fine and even both types *(`utf-8` and `utf-8-bom`)* are listed as possible values. When the files are saved, the encoding (BOM) will change according to settings. *(I made a simple test after migrating my project yesterday to VS2017 when the GIT changes in files.)* – Julo Apr 09 '18 at 13:40
  • 1
    Mainly because some old programs choke on the "BOM", @Manuel. UTF-8 doesn't require it, since byte order within a single code point is only a concern when code points consist of multiple bytes; as UTF-8 only has 8-bite code points, and code points themselves always have to be in the correct order, UTF-8 is always ordered correctly. And while it can serve as a generic UTF-8 signature (and I personally prefer it as such), there are some relatively well-known cases where programs will choke on the BOM and parse a UTF-8 text file incorrectly. So, they generally don't recommend it because of that. – Justin Time - Reinstate Monica Jul 27 '19 at 17:57
  • @JustinTime awesome explanation! – Nicholas Petersen Sep 12 '19 at 14:46
  • 1
    Thank you, @NicholasPetersen. (And, everyone, please ignore the slight typo, where "8-bite" was supposed to be "8-bit". ;P) – Justin Time - Reinstate Monica Sep 13 '19 at 17:24
3

Apparently the "Advanced Save Options\Encoding" option has been removed from the "File" menu due to uncommon use. This was the reason given by a Visual Studio Team member (see this).

The option is still there, but you have to do a couple extra clicks.

  1. In the menu strip, Go To FILE -> Save as
  2. When the Save File Dialog appears, the Save button has a down arrow. Click it.
  3. Select Save with Encoding...

enter image description here

Once you save a file to a certain format (I believe the one you're looking for is Unicode (UTF-8 without signature) - Codepage 65001), Visual Studio should in theory not change it on a whim.

Now here's the problem though, once you remove the BOM, no reader can really know with 100% certainty that a given text file is actually UTF-8. This is just from observing the behaviour, but if you go and Save as With Encoding... and select
Unicode (UTF-8 without signature) - Codepage 65001 (which is UTF8 without BOM), the BOM will be removed.....however, when you close the file and reopen it, then go to Advanced Save Options again, you will notice that Visual Studio assumed that the text format was CodePage 1252. The file will of course be perfectly valid as it maps every possible byte value to some character, but that may give you strange results in some fringe cases.

One thing it will not do, is add the BOM back in (at least I have never seen it). Hope this helps.

Nik
  • 1,780
  • 1
  • 14
  • 23
  • This is definitely the answer for at least two of the questions. Do you know this because of experience, or is there some documentation? – Manuel Apr 11 '18 at 11:04
  • I haven't found any specific documentation for VS on this. All from Experience. In fact, we recently had an issue where for a Win32 app, a Codepage 1252 source file decided to become UTF-8 no signature (although we can't be sure whether it was VS and not the programmer). There was no issue with the code itself, but when deployed, some strings in the UI became a bunch of weird characters. Took us some time to trace this down to a change in Encoding. I imagine this is exactly the reason VS defaults to UTF-8 with signature now. – Nik Apr 11 '18 at 14:15
  • There are various markers a text reader can look for in the text file to "Try" and determine the encoding, but there is really now way to tell for sure. It really depends on the file content. I would say for 99% of source files, there is no difference between Codepage 1252 and UTF-8 No BOM. But obviously you can't really count on it, as any text file is technically a valid Codepage 1252 file. If your source file does have characters that are specifically from the UTF-8 set, it's a coin toss as to how a text reader will interpret it...when without a BOM of course. – Nik Apr 11 '18 at 14:28
1

Check Fix File Encoding extension that prevents Visual Studio 2017/2015/2013/2012 from adding BOM to UTF-8 files.

Normally, when you edit a UTF-8 file in Visual Studio, it adds the byte order mark (BOM) sequence 0xEF, 0xBB, 0xBF to the beginning of the file. Sometimes it confuses other applications further processing the file. You can select an encoding manually (File - Advanced Save Options... or File > Save As... > Save with Encoding...), but you need to do it each time you reopen the file

Also, this extension will answer most of your questions.

Fix File Encoding automatically detects when a UTF-8 file is opened in Visual Studio and sets its encoding to UTF-8 without signature. If you don't edit the file, it remains unmodified. If you edit the file, it will be saved without the BOM.

Fix File Encoding lets you configure which files to encode based on the file path and the file extension. By default, only .htm and .html files are protected from Visual Studio.

ElasticCode
  • 7,311
  • 2
  • 34
  • 45
  • I knew about this plugin. But i actually want to do the opposite. I want to ensure that the files are always encoded UTF8-BOM since i don't have issues with other applications, and it seems to be more reasonable to add the byte order mark. It is easy enough to convert the files initially, but i was interested what the default behaviour of VS2017 is – Manuel Apr 11 '18 at 11:01
  • They mention that by default it adds the byte order mark (BOM), This why VS make another option in saving > Save with Encoding – ElasticCode Apr 11 '18 at 11:12
1

Unfortunately this is to long as comment to Nik's answer, therefore I use another one:

  • VS saves all source code files (.cpp, .cs, .h, etc) and Web files (.htm(l), .css, .xml) in UTF-8 with BOM (with signature in MS jargon).

  • However, VS saves text files created by VS in the code page of the local settings, for example code page 1252 for Western European cultures. VS is clever enough to detect characters that can’t be encoded in the default code page and will prompt you to encode in UTF-8. Visual Studio will automatically save in UTF-8, with BOM of course, if you check "Save documents as Unicode when data cannot be saved in codepage" in the dialog box "Tools/Options/Environment/Documents".

  • You can override the encoding for each single file by using "Save As", but you can't override the default encodings in VS Options

  • But, you can override the default settings with an EditorConfig file. How: https://learn.microsoft.com/en-us/visualstudio/ide/create-portable-custom-editor-options?view=vs-2019.