3

My team does not pay attention to file encoding (and that is correct, because humans should not be bothered by file encodings). However some files are saved in utf8, and some in regional encoding (cp1250).

I need to do two things:

  1. Force utf8 on all files that will be created in future
  2. Convert all existing files with given extension (or at least *.cs) to utf-8

How can I achieve these goals using Visual-Studio, Resharper plugins, or Powershell?

I tried to do #2 with PowerShell, however it is mess (sometimes it removes/adds last line). Probably there is some free software that I can use to do it, point #1 is more important for me.

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Shadow
  • 2,089
  • 2
  • 23
  • 45

4 Answers4

3

Yes, it's possible.

Force UTF-8 on all files

Use .editorconfig as @Richard previously mentioned. Starting from Visual Studio v15.3, .editorconfig support was fixed and improved. This simple .editorconfig at the solution level would be enough to ensure each *.cs is saved in UTF-8 without BOM:

root = true

[*.cs]
charset = utf-8

Moreover, it converts any existing file manually opened and saved by Visual Studio.

Convert all existing code files to UTF-8

I tested some answers from the thread Save all files in Visual Studio project as UTF-8 and they worked badly: non-Latin characters (Cyrillic in my case) had been converted into unreadable glyphs. On the contrary, Visual Studio itself does the "open-save" conversion flawlessly.

To automatically open and re-save all code files in a solution, use a simple R# trick:

  1. Set any R# code style rule appllicable to all your files to the value which strictly denies your company's code conventions. For example, braces layout is an obvious choice.
  2. Apply it to the whole solution using a Code Cleanup feature (Ctrl+E,C by default). Choose a simplest built-in "Reformat Code" template to minimize changes.
  3. After all files have been formatted and saved, revert the R# rules back to their originals and run Code Cleanup once again.

All your *.cs files should be saved in UTF-8 after that (the same idea for another file types supported by R#). Pretty formatting as a bonus.

Ilya Chumakov
  • 23,161
  • 9
  • 86
  • 114
2

Re. #1: There is the option

Environment | Documents | Save documents as Unicode when data cannot be saved in codepage 

but that isn't always. It appears there is no way to force this (and no likely extensions). Ever considered writing an extension :-) ?

Re. #2: it should be doable with PSH (but no final end of line might mess up the simplest approaches). However see https://stackoverflow.com/a/850751/67392

Edit: This seems to be a common request (see User Voice). One of the comments on that User Voice requests that in VS2017 you can use .editorconfig to set the default encoding of files.

Community
  • 1
  • 1
Richard
  • 106,783
  • 21
  • 203
  • 265
1

These days the most effective solution is to:

  1. Set encoding for .cs files in .editorconfig file in the .cs section:
[*.cs]
charset = utf-8
  1. Run dotnet format tool: dotnet format

That's it.

(I initially posted this as a comment but Ilya Chumakov suggested to make it an answer)

Lev
  • 811
  • 12
  • 13
0

Powershell 5.1 script, run in source root

Get-ChildItem -Include *.cs -Recurse | ForEach-Object {
    $file = $_.FullName

    $mustReWrite = $false
    # Try to read as UTF-8 first and throw an exception if
    # invalid-as-UTF-8 bytes are encountered.
    try
    {
        [IO.File]::ReadAllText($file,[Text.Utf8Encoding]::new($false, $true))
    }
    catch [System.Text.DecoderFallbackException]
    {
        # Fall back to Windows-1250
        $content = [IO.File]::ReadAllText($file,[Text.Encoding]::GetEncoding(1250))
        $mustReWrite = $true
    }

    # Rewrite as UTF-8 without BOM (the .NET frameworks' default)
    if ($mustReWrite)
    {
        Write "Converting from 1250 to UTF-8"
        [IO.File]::WriteAllText($file, $content)
    }
    else
    {
        Write "Already UTF-8-encoded"
    }
}