1
var memStream = new MemoryStream();
using (var sw = new StreamWriter(memStream, Encoding.UTF8, 4194304 /* 4 MiB */, leaveOpen: true))
{
     var str = new string(Enumerable.Repeat(' ', 10240 /* 10 * KiB */).ToArray());
     Console.WriteLine(str.Length);
     Console.WriteLine(Encoding.UTF8.GetBytes(str).Length);
     sw.Write(str);
     sw.Flush();
     Console.WriteLine(memStream.Length);
}
// Output
// ---------
// 10240
// 10240
// 10243

// Output which I was expecting
// ---------
// 10240
// 10240
// 10240

I checked the StreamWriter.Write(String) documentation on MSDN but I didn't find anything which mentions that this API can write extra bytes to the stream. (MSDN Doc StreamWriter.Write). I am using .NET Core 3.1, but I am guessing this behavior also holds for Core 2.0 and Framework although I have not explicitly tested my hypothesis for them. I read the StreamWriter documentation thoroughly, I don't find any mention of such a behavior. Is this a bug or expected behavior or am I missing something ?

amit9oct
  • 23
  • 4
  • Although you have included what result you got, you didn't include what you expected to see. Please add. – phuzi Feb 19 '20 at 10:41
  • 7
    I see 10243 as the final line of output - which is what I'd expect from a BOM being added at the start. Do you *repeatedly* get 10305? – Jon Skeet Feb 19 '20 at 10:42
  • [No repro](https://dotnetfiddle.net/CXiQ6Y) – Sweeper Feb 19 '20 at 10:45
  • I also only see 10243 as the final size. – Matthew Watson Feb 19 '20 at 10:45
  • I get the same output as @JonSkeet on .NET Core 3.1.0. – yaakov Feb 19 '20 at 10:46
  • Now I am getting 10243. I think the memory stream which I was using initially had some extra bytes already written. But I understood. The expected output should be 10243 with 10240 for string and 3 bytes for BOM. Thanks :) – amit9oct Feb 19 '20 at 10:47
  • 2
    same here; but: if you don't want a BOM, use `new UTF8Encoding(false)` – Marc Gravell Feb 19 '20 at 10:47
  • Does this answer your question? [StreamWriter and UTF-8 Byte Order Marks](https://stackoverflow.com/questions/5266069/streamwriter-and-utf-8-byte-order-marks) – Andreas Feb 19 '20 at 10:52

2 Answers2

4

You could prevent the output of the BOM by creating a UTF8Encoding that should not emit an UTF8 identifier by using new UTF8Encoding(false):

var memStream = new MemoryStream();
using (var sw = new StreamWriter(memStream, new UTF8Encoding(false), 4194304 /* 4 MiB */, leaveOpen: true))
{
    var str = new string(Enumerable.Repeat(' ', 10240 /* 10 * KiB */).ToArray());
    Console.WriteLine(str.Length);
    Console.WriteLine(Encoding.UTF8.GetBytes(str).Length);
    sw.Write(str);
    sw.Flush();
    Console.WriteLine(memStream.Length);
}
Andreas
  • 5,251
  • 30
  • 43
4

When I run this locally i get

10240
10240
10243

On further inspection the extra 3 bytes appear to be at the beginning of the stream 239 187 191 or EF BB BF in hex. This is the Byte Order Mark (BOM) https://en.wikipedia.org/wiki/Byte_order_mark

To remove these extra characters from the ouptut use new UTF8Encoding(false) to omit the BOM, instead of Encoding.UTF8 in the creation of the StreamWriter

using (var sw = new StreamWriter(memStream, new UTF8Encoding(false), 4194304 /* 4 MiB */, leaveOpen: true))
phuzi
  • 12,078
  • 3
  • 26
  • 50