1

I've been reading answers that explains how to get the size of a string, size in memory or size in file:

My intention is to detemine the amount of bytes that a string will occupy, in specified encoding, when written to file.

However, my function does not return the expected result when I check the size of a string for Encoding.UTF8, Encoding.Unicode (UTF-16) or Encoding.UTF32.

This is what I'm doing:

''' ----------------------------------------------------------------------
''' <summary>
''' Gets the size, in bytes, of how much a string will occupy when written to a file.
''' </summary>
''' ----------------------------------------------------------------------
<DebuggerStepThrough>
<Extension>
Public Function SizeInFile(ByVal sender As String,
                           Optional ByVal encoding As Encoding = Nothing) As Integer

    If (encoding Is Nothing) Then
        encoding = System.Text.Encoding.Default
    End If

    Return encoding.GetByteCount(sender)

End Function

This is how I'm testing it, in the code below, the function says the string size is 2 bytes, but when written to a file the filesize is 4 bytes:

Dim str As String = "Ñ"
Console.WriteLine(String.Format("Size of String : {0}", str.SizeInFile(Encoding.Unicode)))

File.WriteAllText(".\Test.txt", str, Encoding.Unicode)
Console.WriteLine(String.Format("Size of txtfile: {0}", New FileInfo(".\Test.txt").Length))

What am I missing to perform an efficient evaluation of the string size?.

In C# or VB.NET.

Community
  • 1
  • 1
ElektroStudios
  • 19,105
  • 33
  • 200
  • 417

1 Answers1

4

A file may begin with a byte order mark (called BOM) that helps the reader to detect what encoding was used.

The BOM for UTF8 is 3 bytes EF,BB,BF

For UTF16 (Encoding.Unicode) 2 bytes FEFF (encoded as either big endian or little endian depending on the encoding)

For UTF32 4 bytes 0000FEFF

Nir
  • 29,306
  • 10
  • 67
  • 103
  • Ouch! I totally missed the BOM. Now I see that I did a very stupid question because I really knew what is the BOM but just I didn't taken it into account. Thanks a lot! – ElektroStudios Oct 26 '15 at 11:16