Decode zero-terminated UTF-8 string from any position in a byte array

Question

I am looking for a method in the FCL similar to Encoding.UTF8.GetString(bytes, index, count), but one that does not require a count argument and instead assumes that the string at the given index is null-terminated.

I am posting my current solution as an answer (see below), but am curious to see whether someone knows of a more elegant or better-performing approach.

I'm not aware of any method in the .NET Framework that works with 0-terminated UTF-8 strings. The [Marshal Class](http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.marshal.aspx) works with 0-terminated strings but does no UTF-8 conversion; the [UTF8Encoding Class](http://msdn.microsoft.com/en-us/library/system.text.utf8encoding.aspx) always expects a length. — dtb, Feb 24 '13 at 13:29

stakx - no longer contributing · Accepted Answer · 2013-02-24T13:30:39.803

I've written my own method since I haven't found one in the FCL:

using System.Text;

string GetZeroTerminatedUTF8StringAt(byte[] bytes, int index)
{
    int zeroTerminatorIndex = Array.IndexOf<byte>(bytes, value: 0, startIndex: index);
    if (zeroTerminatorIndex >= index)
    {
        return Encoding.UTF8.GetString(bytes, index, count: zeroTerminatorIndex - index);
    }
    else
    {
        throw new ArgumentOutOfRangeException("index", "No zero-terminator found.");
    }
}

While this works, it has one minor issue: It is assumed that no character except '\0' will contain a 0 byte in the UTF-8 encoding. While this is actually the case, it would be nicer if that assumption were fully encapsulated inside the Encoding.UTF8 class.

Decode zero-terminated UTF-8 string from any position in a byte array

1 Answers1