1

I am doing something rather simple in c#, writing a list of strings to text file. My write sub is:

public static bool TextToFile(string fileName, List<string> inString) {
    if (!Directory.Exists(Path.GetDirectoryName(fileName)))
        Directory.CreateDirectory(Path.GetDirectoryName(fileName));
    try {
        if (File.Exists(fileName))
            File.Delete(fileName);

        const int BufferSize = 65536;  // 64 Kilobytes
        using (StreamWriter sw = new StreamWriter(fileName, true, Encoding.UTF8, BufferSize)) {
            if (inString.Count > 0) {
                foreach (string str in inString) {
                    sw.WriteLine(str);
                }
            }
            else
                sw.WriteLine("");
        }
        return true;
    }
    catch {
        return false;
    }
}

I am getting extra stuff at the beginning of the first line though. It does not show in a regular text editor, but when I opened in ultraedit, and went to hex mode, I saw this: enter image description here

My programs that read the text file do see the characters, and confuse it. My list of strings is super clean. I am sometimes writing 100 mb text files, so am setting the buffer to 64k, but I tried leaving it as default with same results. I am on win7 64 bit, using VS 2013.

Jason Watkins
  • 3,766
  • 1
  • 25
  • 39
jmaeding
  • 209
  • 1
  • 2
  • 9
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders May 14 '15 at 23:56
  • http://en.wikipedia.org/wiki/Byte_order_mark – Ňɏssa Pøngjǣrdenlarp May 15 '15 at 00:00
  • I just noticed that if I save the resulting text file as ansi encoding, that wierd text goes away. Then I tried saving back to utf-8, and it came back. Is this proper behavior? I never want hidden characters in a text file... – jmaeding May 15 '15 at 00:01
  • indeed, its the BOM. Should I leave it there or not? – jmaeding May 15 '15 at 00:04
  • I see, like this: New StreamWriter("Foobar.txt", False, utf8WithoutBom) several other posts on this once you know the keywords....thx – jmaeding May 15 '15 at 00:06
  • It does not seem to be the cause, but might be related: length-prefixed strings (http://stackoverflow.com/questions/1488486/why-does-binarywriter-prepend-gibberish-to-the-start-of-a-stream-how-do-you-avo) – heltonbiker May 15 '15 at 00:38
  • To control UTF-8 file saving with or without BOM in UltraEdit, read UE forum answer [What's the best default new file format?](https://www.ultraedit.com/forums/viewtopic.php?f=7&t=15438#p52594) UltraEdit indicates detected encoding of text file in status bar at bottom of UE main window for active file. – Mofi May 16 '15 at 13:59

1 Answers1

1

I had this exact problem and (thanks to the comments on this question) identified these characters as the Byte Order Mark Which then allowed me to change the encoding to exclude the BOM using this code:

using (StreamWriter sw = new StreamWriter(fileName, true, new UTF8Encoding(false), BufferSize))
Phil
  • 397
  • 2
  • 8