0

I'm trying to write byte[] into xml as hex. Like:

new byte[] { 1, 2, 3, 10 } => "0102030A"

I see good posts about conversion, but didn't find a good way to write chars into xml one by one since xmlwriter does not have WriteChar method or WriteRaw with single char override. (Like in TextWriter)

Here's what I'm doing atm:

const string HexChars = "0123456789ABCDEF";

public static void WriteHex(this XmlWriter writer, byte[] bytes)
{
    unchecked
    {
        for (int i = 0; i < bytes.Length; i++)
        {
            var b = bytes[i];
            writer.WriteRaw(HexChars[b >> 4].ToString());
            writer.WriteRaw(HexChars[b & 15].ToString());
        }
    }
}

I don't want to instantiate new array with double size of the byte[] and then write it to xml. WriteBinHex methods adds hypens between values, that's why I didn't use it. I see that base stream is exposed with a property, but I guess it is a bad idea to use it. What I try to achive is doing this with more "streamy" way.

So my question is, what is the fastest way to write single char into xml?

Currently thinking of using smaller char[] buffer to do the writing in loops, if I can't find a better way.

EDIT:

Sorry, I was wrong about WriteBinHex, which has exactly the same output with what I was looking for. I'm adding some benchmarks as answer, so maybe it can help somoeone else.

  • Is that really your question, or is your question "what is the fastest way to write bytes as hex"? That's different. – usr Jan 22 '16 at 14:51
  • WriteRaw certainly is more risky and harder to understand because it bypasses the XML structure completely. It basically just outputs characters to the base stream. What's wrong with the char[] loop approach? – usr Jan 22 '16 at 14:52
  • Both will help, but primary question is writing single char. Since I'll be implementing same method with stream (instead of byte[]) as parameter later. – Ramazan Binarbasi Jan 22 '16 at 14:54
  • And what about WriteValue(char)? – usr Jan 22 '16 at 14:57
  • @usr: ILSpied WriteValue which does many checks which will be slow for me. – Ramazan Binarbasi Jan 22 '16 at 14:59
  • @Sinatr, I wrote my own serializer, in my project I'll be collecting time series values with web services whole day (like 10M service calls per day). Also users will query back the saved values. That's why serialization is my bottleneck atm. – Ramazan Binarbasi Jan 22 '16 at 15:07
  • Correct me if I'm wrong, but wouldn't `.ToString("X")` be able to replace the HexChars constant for converting from decimal to hexadecimal? – Chase Jan 22 '16 at 15:13
  • You are asking the wrong question. Writing a single byte to a file will never be fast. All checks, validations etc that the XML library does is always a magnitude faster than writing a single byte to the file. If you want speed, try to write as much as possible in a single IO operation. – jgauffin Jan 22 '16 at 15:16
  • @jgauffin: Thx for comment, I'll try that with [ThreadLocal] as *usr* suggested. I'll post benchmarks later.. I was in a dream of I'll give a stream reader to xml writer and id does the stuff :) – Ramazan Binarbasi Jan 22 '16 at 15:19
  • @Chase: String format is too slow, I'm already doing the conversion, just wondering a better way where I use streams etc.. – Ramazan Binarbasi Jan 22 '16 at 15:21

2 Answers2

3

Since you want to write chars individually, WriteRaw seems to be the fastest way. Especially, since you already excluded WriteValue.

You can optimize away this HexChars[b >> 4].ToString() expression by precalculating the strings.

If I was you I would use a method that writes entire strings so that the chars do not have to pass through the entire processing and call tree individually. That could provide like 10x speedup when I see what these methods do using Reflector. You said that you are not considering this approach, though.

In Reflector I see that WriteRaw also does quite a lot of stuff. I think this needs to be benchmarked.

If you don't like the temporary char[] or byte[] allocations you can use a [ThreadStatic] temporary buffer for that. The buffer size probably should be in the range 16-256. Big enough to diminish all constant overheads and small enough to fit into the L1 cache and not pollute that cache too much.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Well, using ThreadLocal nearly doubles execution time. If you have a sample I'll be glad, I've tried wrapping buffer creation with ThreadLocal and disposing it after all.. 570 ms execution raised to 960 ms after that. – Ramazan Binarbasi Jan 23 '16 at 04:37
  • ThreadLocal is quite slow, use [ThreadStatic]. I meant that but misremembered the name. Also, post code. Hard to say anything about perf without it. – usr Jan 23 '16 at 09:43
  • If you are trying char-based APIs (not recommended by me) you can try calculating the char using arithmetic instead of using a table. Should be slightly faster since the arithmetic required is just an addition. – usr Jan 23 '16 at 09:45
  • Lookup is faster than arithmetics by this answer in another question: http://stackoverflow.com/a/624379/2266524 . I tried ThreadStatic and it's still slower than plain buffer.. I'll add benchmarks of ThreadStatic and the fastest method mentoined in the link above – Ramazan Binarbasi Jan 23 '16 at 20:30
  • Ah, I overlooked the fact that 0-9 and a-f are not adjacent in ASCII. That makes the lookup solution a competitive candidate. In any case the XmlWriter probably takes 95% of the time so that this optimization helps only a little. The big one is removing the ToString allocation for every char. – usr Jan 23 '16 at 21:58
-1

I tried 5 methods and here are benchmarks.

First of all, code is release compiled, stopwatch is used, 4 different length of arrays are measured. GC is collected before each measure. Iteration counts are different for each length to show similar time values (e.g.: byte[16] is iterated 100K times, byte[128K] iterated 40 times). Each iteration creates a xml writer, writes same byte[] as 10 elements in it.

All methods are compared against below method, which is XmlWriter's WriteBinHex:

writer.WriteBinHex(bytes, 0, bytes.Length);

All below methods are running within unchecked block (e.g. unchecked { ... })

Method-1: Full Char[]

var result = new char[bytes.Length * 2];
byte b;
for (int i = 0; i < bytes.Length; i++)
{
    b = bytes[i];
    result[i * 2] = HexChars[b >> 4];
    result[i * 2 + 1] = HexChars[b & 15];
}
writer.WriteRaw(result, 0, result.Length);

Method-2: Buffer

var bufferIndex = 0;
var bufferLength = bytes.Length < 2048 ? bytes.Length * 2 : 4096;
var buffer = new char[bufferLength];

for (int i = 0; i < bytes.Length; i++)
{
    var b = bytes[i];
    buffer[bufferIndex] = HexChars[b >> 4];
    buffer[bufferIndex + 1] = HexChars[b & 15];

    bufferIndex += 2;
    if (bufferIndex.Equals(bufferLength))
    {
        writer.WriteRaw(buffer, 0, bufferLength);
        bufferIndex = 0;
    }
}

if (bufferIndex > 0)
    writer.WriteRaw(buffer, 0, bufferIndex);

Method-3: RawCharByChar

for (int i = 0; i < bytes.Length; i++)
{
    var b = bytes[i];
    writer.WriteRaw(HexChars[b >> 4].ToString());
    writer.WriteRaw(HexChars[b & 15].ToString());
}

Method-4: StringFormatX2

for (int i = 0; i < bytes.Length; i++)
    writer.WriteRaw(bytes[i].ToString("x2"));

Results: (Length vs Time in ms)

Method: BinHex
16 bytes: 971 ms, 1 Kb: 800 ms, 128 Kb: 906 ms, 2Mb: 1291 ms

Method: Full Char[]
16 bytes: 828 ms, 1 Kb: 612 ms, 128 Kb: 780 ms, 2 Mb: 1112 ms
AVG: -16%

Method: Buffer
16 bytes: 834 ms, 1 Kb: 671 ms, 128 Kb: 712 ms, 2 Mb: 1059 ms
AVG: -17%

Method: RawCharByChar
16 bytes: 2624 ms, 1 Kb: 6515 ms, 128 Kb: 6979 ms, 2 Mb: 8282 ms
AVG: +524%

Method: StringFormatX2
16 bytes: 3706 ms, 1 Kb: 10025 ms, 128 Kb: 10490 ms, 2 Mb: 26562 ms
AVG: +1113%

I will go on with Buffer implementation in this case, which is 17% faster than WriteBinHex.

EDIT:

With thread static marked buffer field (compared to WriteBinHex method)

16 Byte: -3%, 1 Kbyte: -10%, 128 Kbyte: -14%, 2 Mb: -11%
Average: -9% Which was -17% with normal buffer so I'm giving up on ThreadLocal/Static. Also tried with 128 / 256 char buffers, got similar results.

[ThreadStatic]
static char[] _threadStaticBuffer = new char[240]; 

private void Test(XmlWriter writer, byte[] bytes)
{
    var bufferIndex = 0;
    var bufferLength = bytes.Length < 120? bytes.Length * 2 : 240;
    var buffer = _threadStaticBuffer;

    for (int i = 0; i < bytes.Length; i++)
    {
        var b = bytes[i];
        buffer[bufferIndex] = HexChars[b >> 4];
        buffer[bufferIndex + 1] = HexChars[b & 15];

        bufferIndex += 2;
        if (bufferIndex.Equals(bufferLength))
        {
            writer.WriteRaw(buffer, 0, bufferLength);
            bufferIndex = 0;
        }
    }

    if (bufferIndex > 0)
        writer.WriteRaw(buffer, 0, bufferIndex);
}

EDIT-2:

After I read some posts, I benchmarked my Method-2 with method mentoined in https://stackoverflow.com/a/624379/2266524, where instead of 16 char lookup, 256 * uint lookup is used.

Here are the results compared to WriteBinHex method:

Method: WriteBinHex
16 bytes: 745, 1 Kb: 679, 128 Kb: 739, 2 Mb: 1038

Method: Buffered char[] 256 uint lookup
16 bytes: 653, 1 Kb: 454, 128 Kb: 502, 2 Mb: 758
AVG: -26%

Method: Buffered char[] unsafe 256 uint lookup
16 bytes: 645, 1 Kb: 371, 128 Kb: 424, 2 Mb: 663
AVG: -34%

The code:

Method-5: Buffer with 256 uint lookup

private static readonly uint[] _hexConversionLookup = CreateHexConversionLookup();
private static uint[] CreateHexConversionLookup()
{
    var result = new uint[256];
    for (int i = 0; i < 256; i++)
    {
        string s = i.ToString("X2");
        result[i] = ((uint)s[0]) + ((uint)s[1] << 16);
    }
    return result;
}

private void TestBufferWith256UintLookup(XmlWriter writer, byte[] bytes)
{
    unchecked
    {
        var bufferIndex = 0;
        var bufferLength = bytes.Length < 2048 ? bytes.Length * 2 : 4096;
        var buffer = new char[bufferLength];

        for (int i = 0; i < bytes.Length; i++)
        {
            var b = _hexConversionLookup[bytes[i]];
            buffer[bufferIndex] = (char)b;
            buffer[bufferIndex + 1] = (char)(b >> 16);

            bufferIndex += 2;
            if (bufferIndex == bufferLength)
            {
                writer.WriteRaw(buffer, 0, bufferLength);
                bufferIndex = 0;
            }
        }

        if (bufferIndex > 0)
            writer.WriteRaw(buffer, 0, bufferIndex);
    }
}

Method-6: Unsafe buffer with 256 uint lookup

private static readonly uint[] _hexConversionLookup = CreateHexConversionLookup();
private static uint[] CreateHexConversionLookup()
{
    var result = new uint[256];
    for (int i = 0; i < 256; i++)
    {
        string s = i.ToString("X2");
        result[i] = ((uint)s[0]) + ((uint)s[1] << 16);
    }
    return result;
}

private unsafe static readonly uint* _byteHexCharsP = (uint*)GCHandle.Alloc(_hexConversionLookup, GCHandleType.Pinned).AddrOfPinnedObject();

private unsafe void TestBufferWith256UintLookupUnsafe(XmlWriter writer, byte[] bytes)
{
    fixed (byte* bytesP = bytes)
    {
        var bufferIndex = 0;
        var bufferLength = bytes.Length < 2048 ? bytes.Length : 2048;
        var charBuffer = new char[bufferLength * 2];
        fixed (char* bufferP = charBuffer)
        {
            uint* buffer = (uint*)bufferP;
            for (int i = 0; i < bytes.Length; i++)
            {
                buffer[bufferIndex] = _byteHexCharsP[bytesP[i]];

                bufferIndex++;
                if (bufferIndex == bufferLength)
                {
                    writer.WriteRaw(charBuffer, 0, bufferLength * 2);
                    bufferIndex = 0;
                }
            }
        }

        if (bufferIndex > 0)
            writer.WriteRaw(charBuffer, 0, bufferIndex * 2);
    }
}

My choice is #6, but you may prefer #5 for safe version. I appreciate any comments to make it faster, thanks..

Community
  • 1
  • 1
  • Thanks for posting! In the thread-static version _threadStaticBuffer is not used at all? Is that a bug? In any case make sure it is only read/written once, not for each access. Access is expensive, but amortized over a big buffer you'll not be able to even measure the overhead. – usr Jan 25 '16 at 12:58
  • I think it is copy paste error, I will check and post back. – Ramazan Binarbasi Jan 25 '16 at 20:19
  • Yup, it was a copy-paste error, sorry for that.. I've corrected it and re check with 240 byte buffer.. It's 6% faster but still not better than others.. Thx for your time to help me. I'm going with #6 method which is faster and suitable for streams (I mean chunked). – Ramazan Binarbasi Jan 26 '16 at 00:19
  • The code now uses _threadStaticBuffer all the time which causes TLS access for each byte. You need to cache that value in a local once. Would be interested in hearing about the new benchmark results. – usr Jan 26 '16 at 08:27
  • I've edited the answer, tried it and there's no change. Both class level field and function level variable usage result in same performance. – Ramazan Binarbasi Jan 27 '16 at 17:27