0

If I run

string myString = "*.txt";
Print("sizeof(char): " + sizeof(char) + " bytes");
Print("myString.Length * sizeof(char): " + (myString.Length * sizeof(char)) + " bytes");

It will print

sizeof(char): 2 bytes

myString.Length * sizeof(char): 10 bytes

But, if I run the code from the first answer to this question:

myString = "*.txt"
long size = 0;
using (Stream s = new MemoryStream())
{
    BinaryFormatter formatter = new BinaryFormatter();
    formatter.Serialize(s, myString);
    size = s.Length;
}
Print("myString Serialized Size: " + size + " bytes");

I get

myString Serialized Size: 29 bytes

Which of these is a more accurate representation of how much space my string is taking up in memory?

Community
  • 1
  • 1
user430481
  • 315
  • 1
  • 4
  • 14
  • 3
    Why don't you read the comments in the answer? "This will put so much more. It adds the DLL name and version, ... this is not a way to calculate object size." – Camilo Terevinto May 31 '19 at 15:21
  • 5
    Neither tells you how much memory is actually required to hold the string data in memory. The latter one serializes a string object and tells you the size of the serialized data (which includes object meta data, such as object type). The former tells you the length of the text data in bytes, but that's not all information of a string. Take the first result and add 4...8 bytes for additional string length information (and then round the result up to the next multiple of 4 or 8 to account for possible "wastage" due to memory alignment ). This should be a good estimate... –  May 31 '19 at 15:21

1 Answers1

3

Asking about the size (bytes) of a string is complex;

  • internally, it will be UTF-16, so: twice as many characters (assuming it wasn't created over-sized, which is possible)
    • but the string object itself has the string length and the object overhead to consider, then there's "padding" etc
  • if you're talking about size in vanilla binary encodings, then you need to know what Encoding you're discussing; ASCII, UTF-8, UTF-16, etc - plus you need to know whether or not you're including a BOM
  • the one thing you would not do is run it through BinaryFormatter; BinaryFormatter is a general purpose serializer that includes type metadata, field names, etc; in general, you should almost never use BinaryFormatter ... for anything :)

So: the reason you're getting an unexpected answer is that you're asking the wrong question. For the "in memory" discussion, you're really after the first bullet. It isn't easy to give an exact answer because the size of the object overhead depends on your target platform.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Is there a function that I could use to find the overhead if I know my target platform is Windows? Cause I could write a utility method that acts differently depending on the target platform and has a different case for each of the major platforms. Or even better, is there a function that can find the size of everything NOT actually in memory, and can subtract that off of the estimate given by the BinaryFormatter? – user430481 May 31 '19 at 17:10
  • 1
    @user430481 it isn't just Windows. It is the CPU. It is the .NET version. And stop thinking about BinaryFormatter - it has nothing to do with the information you're trying to find. But basically, "not really, go with roughly 2 bytes per character plus 4 bytes plus a bit". – Marc Gravell May 31 '19 at 20:15