0

I have a StreamWriter open to my file during the same time that I read from it, which seems to be causing issues (this is a smaller snippet of a larger set of code, just shown to illustrate my issue):

static void Main(string[] args)
{
    for (int i = 0; i < 3; i++)
    {
        using (FileStream stream = new FileStream("file.txt", FileMode.OpenOrCreate))
        using (StreamReader reader = new StreamReader(stream, Encoding.UTF8, false, 0x1000, true))
        using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8, 0x1000, true))
        {
            Console.WriteLine("Read \"" + reader.ReadToEnd() + "\" from the file.");
        }
    }
    Console.ReadLine();
}

The above code will output:

Read "" from the file.
Read "" from the file.
Read "?" from the file.

If the file already contains some text, the writer will append the BOM to the end despite never having been called to write anything:

Read "TEXT" from the file.
Read "TEXT?" from the file.
Read "TEXT??" from the file.

Why does it exhibit this behavior?

Alexandru
  • 12,264
  • 17
  • 113
  • 208
  • possible duplicate of [How do I ignore the UTF-8 Byte Order Marker in String comparisons?](http://stackoverflow.com/questions/2915182/how-do-i-ignore-the-utf-8-byte-order-marker-in-string-comparisons) – grovesNL Dec 04 '14 at 16:35
  • @grovesNL This is about StreamReader, not about GetString, and those answers don't help me. – Alexandru Dec 04 '14 at 16:41
  • 1
    @grovesNL even if it's BOM value I'd be surprised to see it **at the end** instead of beginning... – Adriano Repetti Dec 04 '14 at 16:41
  • @Alexandru if BOM (0xFEFF) was at beginning that it was about StreamReader. As this...probably your file is just corrupted. – Adriano Repetti Dec 04 '14 at 16:42
  • @AdrianoRepetti I can regenerate it with 100% accuracy every time. I can delete that file, regenerate a new file, or use any existing file, and have just the calls to StreamReader introduce byte order marks at the end of the file...maybe its in the way I create my StreamReader, I will post an update soon. – Alexandru Dec 04 '14 at 16:45
  • @AdrianoRepetti Update posted. – Alexandru Dec 04 '14 at 16:48
  • Can you post minimal complete program what shows this strange behaviour? – Taukita Dec 04 '14 at 17:11
  • @Alexandru: Do you have the file open in Notepad or another program as you're reading it? – grovesNL Dec 04 '14 at 17:20
  • @Taukita Updating now, just figured out the source of the problem, not exactly what I expected but makes more sense that it is at the end. – Alexandru Dec 04 '14 at 17:20
  • @grovesNL Check my edit (will be up shortly)! – Alexandru Dec 04 '14 at 17:20
  • Updated with minimalistic code to reproduce the issue I am seeing (seems its the StreamWriter that is actually problematic, sorry for the mess-up, guys). But, I still want to understand why this happens. – Alexandru Dec 04 '14 at 17:26
  • 2
    Having a reader and a writer on the same stream is usually not a good idea... – Thomas Levesque Dec 04 '14 at 17:42
  • @ThomasLevesque With proper synchronization, it should be manageable. C# sure does allow some crazy things to happen, doesn't it? :) – Alexandru Dec 04 '14 at 19:01
  • @Alexandru, well, it's possible of course, but it's error prone... don't do it unless you have a very good reason. – Thomas Levesque Dec 04 '14 at 20:27
  • @ThomasLevesque Well, I'm sure my issue above would come up regardless of using 2 streams, for example if I were to only have the `StreamWriter`, but inside my `using` block if I were to set `stream.Position = someInt64Value` it would surely have the same issue. – Alexandru Dec 04 '14 at 21:17

2 Answers2

4

As I previously implied in my comment about byte order marks, you are trying to avoid adding a byte order mark with StreamWriter. This is based on the encoder you are using.

For example, try creating your own encoder without writing a byte order mark:

static void Main(string[] args)
{
    for (int i = 0; i < 3; i++)
    {
        using (FileStream stream = new FileStream("file.txt", FileMode.OpenOrCreate))
        using (StreamReader reader = new StreamReader(stream, Encoding.UTF8, true, 0x1000, true))
        using (StreamWriter writer = new StreamWriter(stream, new UTF8Encoding(false), 0x1000, true))
        {
            Console.WriteLine("Read \"" + reader.ReadToEnd() + "\" from the file.");
        }
    }
    Console.ReadLine();
}

By using new UTF8Encoding(false) as your UTF8 encoder, the encoder is explicitly instructed not to use Unicode byte order marks. This is described in the MSDN entry for the UTF8Encoding constructor.

grovesNL
  • 6,016
  • 2
  • 20
  • 32
  • Yes, this works. I guess I was reading to the end of the stream using the StreamReader...then the writer would get disposed, and on dispose I suppose the stream writer, thinking it is at the start of the stream since it wasn't invoked, appends the BOM flags for UTF8, which is not smart because it should read the position of the FileStream to know where its at. Without those flags, you'd have to just know the encoding off the bat to open and read from the file. Am I right? – Alexandru Dec 04 '14 at 17:35
  • 1
    @Alexandru: Yes, it's more clearly expressed when you write to your `writer` before your `Console.WriteLine` call. Just try adding `writer.Write("test")` and watch how the byte order mark is appended. – grovesNL Dec 04 '14 at 17:42
1

Well. I think writer want to write byte order mark even if you dont write anything. You move stream position to end of stream, so when you dispose writer - it flush byte order mark to end of stream.

Try this code

    static void Main(string[] args)
    {
        for (int i = 0; i < 3; i++)
        {
            using (FileStream stream = new FileStream("sample.txt", FileMode.OpenOrCreate))
            using (StreamReader reader = new StreamReader(stream, Encoding.UTF8, false, 0x1000, true))
            using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8, 0x1000, true))
            {
                writer.Flush();
                Console.WriteLine("Read \"" + reader.ReadToEnd() + "\" from the file.");
            }
        }
        Console.ReadLine();
    }

You will see expected behaviour, without '?' symbols.

Taukita
  • 178
  • 10
  • I wish I could accept two answers, but groves did beat you to it. People: If you read this, this is also a very solid approach. Taukita, this would cause the writer to ensure it BOM marks the file at the beginning, always. – Alexandru Dec 04 '14 at 17:46
  • This works really great. I've taken this approach in the library I'm writing, because what this gives you is proper BOM tags at the start of a new file. – Alexandru Dec 04 '14 at 19:00