4

I have a very simple C# program which iterates over a number of files and replaces a string in all the files.

However, when I compare these files using Git, it highlights a change to all my files.

My C# code is:

string[] files = Directory.GetFiles(path, "*", SearchOption.AllDirectories);

foreach (string file in files)
{
    string fileText = File.ReadAllText(fileName, Encoding.UTF8);
    string newText = fileText.Replace("hello", "goodbye");
    File.WriteAllText(fileName, newText, Encoding.UTF8);
}

Which as far as I'm concerned, looks good. However, when I run this program and execute git status on the repository, I see differences in every file.

Using a program like Github Desktop or SourceTree reveals the following changes:

Github Desktop

Github Desktop Screenshot

Sourcetree

Sourcetree Screenshot

Thank you for any tips or ideas anyone may have. They're greatly appreciated. :)

Liam
  • 27,717
  • 28
  • 128
  • 190
James Warner
  • 385
  • 2
  • 14
  • 8
    I suspect its writing a byte-order-mark. Just a guess: the encoding you're using when writing the file is different from the existing files encoding. Is the original file UTF-8? –  Jul 20 '18 at 14:22
  • 2
    By the way, you are writing the `original` text back to the file, `not` the transformed text. So @Amy might have something there –  Jul 20 '18 at 14:23
  • 1
    @Amy I think you're right! Adding a method to get the encoding for a file and using that seems to have fixed my issue. Thanks. MickyD that was just a typo in the question, but thanks for spotting it! – James Warner Jul 20 '18 at 14:29
  • Possible duplicate of [SaveFileDialog producing unrecognized character](https://stackoverflow.com/questions/44397355/savefiledialog-producing-unrecognized-character) – Raymond Chen Jul 20 '18 at 14:35

2 Answers2

6

This character is the Unicode Byte Order Mark (BOM) preamble, which is automatically added by the WriteAllText method.

If you want to write files without BOM, you have to create custom encoding:

Encoding utf8NoBom = new UTF8Encoding(false);

And you then pass the instance as third parameter of the WriteAllLines method:

File.WriteAllText(fileName, fileText, utf8NoBom);
Martin Zikmund
  • 38,440
  • 7
  • 70
  • 91
  • 3
    Please note that for (at least) [mscorlib .Net 4.7.1](https://learn.microsoft.com/en-us/dotnet/api/system.io.file.writealltext?view=netframework-4.7.1#System_IO_File_WriteAllText_System_String_System_String_) the default Encoding is UTF-8 without BOM (see remarks). If you stroll through the *Reference Source* you will notice that MS changed the default encoding to UTF-8 without BOM for other stuff as well (i.e `StreamWriter`). – ckerth Jul 20 '18 at 14:46
2

Thanks to the comment from @Amy I have managed to identify the issue. I assumed all my files were encoded as UTF-8, but this wasn't the case.

Using the answer specified here I was able to identify the encoding of my file and using that when reading/writing from/to the file.

My code now looks like this (using the 'GetEncoding' method specified in this answer):

string[] files = Directory.GetFiles(path, "*", SearchOption.AllDirectories);

foreach (string file in files)
{
    Encoding fileEncoding = GetEncoding(fileName);
    string fileText = File.ReadAllText(fileName, fileEncoding);
    string newText = fileText.Replace("hello", "goodbye");
    File.WriteAllText(fileName, newText, fileEncoding);
}
James Warner
  • 385
  • 2
  • 14