What is the most efficient way to get the amount of bytes created from encoding?

Question

I need a byte buffer to be sent over TCP. I need a way of efficiently determining the amount of bytes created from encoding something like a string.

There would be no need for this, if i simply used this code.

byte[] buffer = encoder.GetBytes("Hello Client!");
clientStream.Write(buffer, 0 , buffer.Length);

But the problem is, i'm going to be sending multiple messages one after another, and this code allocates memory for a byte buffer every time i want to send a message. It's my understanding that his is inefficient / slow because it allocates memory every time.

What I want to do is just create a large byte buffer, and write all my messages to it, and send only part of the array that has the message. But I can't find a way of efficiently doing this. ASCII.Encoding.Getbytes(string) will just return the byte array and put it into my large byte buffer, starting from position 0. I need the length of the bytes of the message put into the byte buffer, without having to call the getbytes(string).Length, because this encodes it again, which is inefficient.

There is probably some obvious solution to this that I can't find.

Before you start to worry about inefficiencies you should profile and measure your code. You will probably find that this won't be a bottleneck though, but you can't know for sure until you measure. — Some programmer dude, Sep 26 '12 at 07:15
Although it is good to know what is the best way to write the code, i feel that with platforms like .net, worrying about micro-optimization is a waste of time, as the clr will optimize your code when you compile it — Vamsi, Sep 26 '12 at 07:27
@VamsiKrishna The CLR isn't magic. Laxity on your part leads to non-performant code. It just happens that most of the time it goes unnoticed. — Asti, Sep 26 '12 at 07:36
@Asti i never said it was magic, i never even said that it is not required to learn how to write good code, i just said that some times the compiler will do the optimization for you, may be i didn't say it exactly like that, but that's what i meant — Vamsi, Sep 26 '12 at 07:45
@VamsiKrishna Fair enough. These days there's a trend towards (mostly people who started off with Java/.Net) not bothering about performance / GC at all because people have this notion that the compiler/VM/JIT will do something about (which happen less often that not). Like [this question.](http://stackoverflow.com/questions/12428622/how-can-i-combine-a-method-and-a-dictionary-used-by-the-method-for-lookups/12428735) It's like people are trying to compensate for Moore's law by writing slower code. — Asti, Sep 26 '12 at 09:24
@Asti i agree with you on that, however there is also a trend where some people(myself included) are worrying too much about optimizing minute details that they end up reinventing the wheel, where the end result is not even a perfect circle, In my above comment i was just trying to give a heads up to the OP on going that path, that Moore's law joke was a good one though(+1 for that) — Vamsi, Sep 26 '12 at 09:35

score 0 · Accepted Answer · answered Sep 26 '12 at 07:28

I agree with Joachim in that you seem like you're trying to prematurely optimize your program without any evidence (such as profiling data) that suggests you need to do this in the first place. The great Donald Knuth said "premature optimization is the root of all evil" - take it to heart.

That aside, the second issue is that allocation is not an expensive operation. Generally speaking allocation completes in O(1) time. The actual encoding operation is many times more expensive.

Third, yes, there is a solution to your problem; but I don't see the point because the number of bytes a string requires with a given encoding is unpredictable, which is why (by default) the Encoding subclasses are free to allocate and return their own buffer, as it means you will never need to call the method again with a bigger buffer in case your initial call provided an insufficiently-sized buffer.

Another problem is that .NET strings, unlike C null-terminated strings, are of a fixed length and lack terminators (.NET strings can have the null-character in them, C-strings can't). So you may need to clear the buffer every time you use it, which further slows down your program:

There are two methods you'd need to use: Encoding.GetBytesCount(String) or Encoding.GetBytes(String, Int32, Int32, Byte[], Int32 ), like so:

Encoding encoder = ...
Byte[] buffer = new Byte[1024]; // allocating a 1KB-sized buffer which it is hoped is large enough for every possible string

foreach(String str in stringsToEncode) {
    buffer.Initialize(); // reset every byte to zero (your program may need this, or it may not; I don't know enough about it).

    Int32 bytesWritten;
    do {
        try {
            bytesWritten = encoder.GetBytes( str, 0, str.Length, buffer, 0 );
        } catch(ArgumentException) {
            bytesWritten = Int32.MaxValue;
            buffer = new Byte[ buffer.Length * 2 ];
        }
    }
    while( bytesWritten == Int32.MaxValue )
}

Of course this code is going to have problems of its own. But you should get the idea.

What is the most efficient way to get the amount of bytes created from encoding?

1 Answers1