You can't avoid allocation of byte[]
if you use the UTF8 encoding which is variable length. So the length of the resulting string can be determined only after reading all of these bytes.
Let's see the UTF8Encoding.GetString
method:
public override unsafe String GetString(byte[] bytes, int index, int count)
{
// Avoid problems with empty input buffer
if (bytes.Length == 0) return String.Empty;
fixed (byte* pBytes = bytes)
return String.CreateStringFromEncoding(
pBytes + index, count, this);
}
It calls the String.CreateStringFromEncoding
method which gets the resulting string length first, then allocates it and fills it with characters without additional allocations. The UTF8Encoding.GetChars
allocates nothing too.
unsafe static internal String CreateStringFromEncoding(
byte* bytes, int byteLength, Encoding encoding)
{
int stringLength = encoding.GetCharCount(bytes, byteLength, null);
if (stringLength == 0)
return String.Empty;
String s = FastAllocateString(stringLength);
fixed (char* pTempChars = &s.m_firstChar)
{
encoding.GetChars(bytes, byteLength, pTempChars, stringLength, null);
}
}
If you will use a fixed length encoding, then you can allocate a string directly and use Encoding.GetChars
on it. But you will lose performance on calling Stream.ReadByte
multiple times since there's no Stream.Read
that accepts byte*
as an argument.
const int bufferSize = 256;
string str = new string('\0', n / bytesPerCharacter);
byte* bytes = stackalloc byte[bufferSize];
fixed (char* pinnedChars = str)
{
char* chars = pinnedChars;
for (int i = n; i >= 0; i -= bufferSize)
{
int byteCount = Math.Min(bufferSize, i);
int charCount = byteCount / bytesPerCharacter;
for (int j = 0; j < byteCount; ++j)
bytes[j] = (byte)stream.ReadByte();
encoding.GetChars(bytes, byteCount, chars, charCount);
chars += charCount;
}
}
So you already use the better way to get strings. The only thing that could be done in this situation is implementing the ByteArrayCache
class. It should be similar to StringBuilderCache
.
public static class ByteArrayCache
{
[ThreadStatic]
private static byte[] cachedInstance;
private const int maxArraySize = 1024;
public static byte[] Acquire(int size)
{
if (size <= maxArraySize)
{
byte[] instance = cachedInstance;
if (cachedInstance != null && cachedInstance.Length >= size)
{
cachedInstance = null;
return instance;
}
}
return new byte[size];
}
public static void Release(byte[] array)
{
if ((array != null && array.Length <= maxArraySize) &&
(cachedInstance == null || cachedInstance.Length < array.Length))
{
cachedInstance = array;
}
}
}
Usage:
var bytes = ByteArrayCache.Acquire(n);
stream.Read(bytes, 0, n);
var str = Encoding.UTF8.GetString(bytes);
ByteArrayCache.Release(bytes);