6

Before we start, I am aware of the term "premature optimization". However the following snippets have proven to be an area where improvements can be made.

Alright. We currently have some network code that works with string based packets. I am aware that using strings for packets is stupid, crazy and slow. Sadly, we don't have any control over the client and so have to use strings.

Each packet is terminated by \0\r\n and we currently use a StreamReader/Writer to read individual packets from the stream. Our main bottleneck comes from two places.

Firstly: We need to trim that nasty little null-byte off the end of the string. We currently use code like the following:

line = await reader.ReadLineAsync();
line = line.Replace("\0", ""); // PERF this allocates a new string
if (string.IsNullOrWhiteSpace(line))
    return null;
var packet = ClientPacket.Parse(line, cl.Client.RemoteEndPoint);

As you can see by that cute little comment, we have a GC performance issue when trimming the '\0'. There are numerous different ways you could trim a '\0' off the end of a string, but all will result in the same GC hammering we get. Because all string operations are immutable, they result in a new string object being created. As our server handles 1000+ connections all communicating at around 25-40 packets per second (its a game server), this GC matter is becoming an issue. So here comes my first question: What is a more efficient way of trimming that '\0' off the end of our string? By efficient I don't only mean speed, but also GC wise (ultimately I'd like a way to get rid of it without creating a new string object!).

Our second issue also stems from GC land. Our code looks somewhat like the following:

private static string[] emptyStringArray = new string[] { }; // so we dont need to allocate this
public static ClientPacket Parse(string line, EndPoint from)
{
    const char seperator = '|';

    var first_seperator_pos = line.IndexOf(seperator);
    if (first_seperator_pos < 1)
    {
        return new ClientPacket(NetworkStringToClientPacketType(line), emptyStringArray, from);
    }
    var name = line.Substring(0, first_seperator_pos);
    var type = NetworkStringToClientPacketType(name);
    if (line.IndexOf(seperator, first_seperator_pos + 1) < 1)
        return new ClientPacket(type, new string[] { line.Substring(first_seperator_pos + 1) }, from);
    return new ClientPacket(type, line.Substring(first_seperator_pos + 1).Split(seperator), from);
}

(Where NetworkStringToClientPacketType is simply a big switch-case block)

As you can see we already do a few things to handle GC. We reuse a static "empty" string and we check for packets with no parameters. My only issue here is that we are using Substring a lot, and even chain a Split on the end of a Substring. This leads to (for an average packet) almost 20 new string objects being created and 12 being disposed of EACH PACKET. This causes a lot of performance issues when load increases anything over 400 users (we gotz fast ram :3)

Has anyone had an experience with this sort of thing before or could give us some pointers into what to look into next? Maybe some magical classes or some nifty pointer magic?

(PS. StringBuilder doesn't help as we aren't building strings, we are generally splitting them.)

We currently have some ideas based on an index based system where we store the index and length of each parameter rather than splitting them. Thoughts?

A few other things. Decompiling mscorlib and browsing the string class code, it seems to me like IndexOf calls are done via P/Invoke, which would mean they have added overhead for each call, correct me if I'm wrong? Would it not be faster to implement an IndexOf manually using a char[] array?

public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType)
{
    ...
    return TextInfo.IndexOfStringOrdinalIgnoreCase(this, value, startIndex, count);
    ...
}

internal static int IndexOfStringOrdinalIgnoreCase(string source, string value, int startIndex, int count)
{
    ...
    if (TextInfo.TryFastFindStringOrdinalIgnoreCase(4194304, source, startIndex, value, count, ref result))
    {
        return result;
    }
    ...
}

...

[DllImport("QCall", CharSet = CharSet.Unicode)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool InternalTryFindStringOrdinalIgnoreCase(int searchFlags, string source, int sourceCount, int startIndex, string target, int targetCount, ref int foundIndex);

Then we get to String.Split which ends up calling Substring itself (somewhere along the line):

// string
private string[] InternalSplitOmitEmptyEntries(int[] sepList, int[] lengthList, int numReplaces, int count)
{
    int num = (numReplaces < count) ? (numReplaces + 1) : count;
    string[] array = new string[num];
    int num2 = 0;
    int num3 = 0;
    int i = 0;
    while (i < numReplaces && num2 < this.Length)
    {
        if (sepList[i] - num2 > 0)
        {
            array[num3++] = this.Substring(num2, sepList[i] - num2);
        }
        num2 = sepList[i] + ((lengthList == null) ? 1 : lengthList[i]);
        if (num3 == count - 1)
        {
            while (i < numReplaces - 1)
            {
                if (num2 != sepList[++i])
                {
                    break;
                }
                num2 += ((lengthList == null) ? 1 : lengthList[i]);
            }
            break;
        }
        i++;
    }
    if (num2 < this.Length)
    {
        array[num3++] = this.Substring(num2);
    }
    string[] array2 = array;
    if (num3 != num)
    {
        array2 = new string[num3];
        for (int j = 0; j < num3; j++)
        {
            array2[j] = array[j];
        }
    }
    return array2;
}

Thankfully Substring looks fast (and efficient!):

private unsafe string InternalSubString(int startIndex, int length, bool fAlwaysCopy)
{
    if (startIndex == 0 && length == this.Length && !fAlwaysCopy)
    {
        return this;
    }
    string text = string.FastAllocateString(length);
    fixed (char* ptr = &text.m_firstChar)
    {
        fixed (char* ptr2 = &this.m_firstChar)
        {
            string.wstrcpy(ptr, ptr2 + (IntPtr)startIndex, length);
        }
    }
    return text;
}

After reading this answer here, I'm thinking a pointer based solution could be found... Thoughts?

Thanks.

Community
  • 1
  • 1
jduncanator
  • 2,154
  • 1
  • 22
  • 39
  • 2
    Have you thought about processing the message not as a string but as an `IEnumerable` or a `char` array? – Daniel Hilgarth Sep 09 '13 at 06:57
  • @DanielHilgarth - Excellent idea...I was thinking along the same lines (though I hadn't got to `IEnumerable` yet). – Tim Sep 09 '13 at 06:58
  • @DanielHilgarth The thought has come to mind however most of the rest of the code works based on string parsing too (eg, `int.Parse(TheStringParamHere)`) so we would end up having to turn it into a string anyway. What was your idea with an IEnumerable? How would that be used? – jduncanator Sep 09 '13 at 06:59
  • Considering you are using an `async` and a `ReadLine`, are you really sure your problem is in the `string.Replace`? Because between the hidden state machine created by `async`, the `StringBuilder` probably used by `ReadLine`, the temporary buffer for the decoding of UTF8/Unicode that `ReadLine` must do... – xanatos Sep 09 '13 at 07:07
  • Well all profiling points to MASSIVE time spent in GC all centered around the string methods used. (MASSIVE == almost 60% of total CPU time) – jduncanator Sep 09 '13 at 07:14
  • @jduncanator What encoding are you using with `reader`? – xanatos Sep 09 '13 at 07:26
  • 1
    Just curious, are you using server GC mode? – Mike Zboray Sep 09 '13 at 07:27
  • @xanatos Good question, we are using UTF8Encoding. – jduncanator Sep 09 '13 at 07:31
  • @mikez Yes we are, when we first profiled it it was horrendous, then we remembered to enable Server GC on our development computers and it got a tiny bit better. – jduncanator Sep 09 '13 at 07:34
  • Why not uses string.TrimeEnd ??? – Alessandro D'Andria Sep 09 '13 at 07:36
  • @AlessandroD'Andria TrimEnd still causes a new string to be allocated. Same thing as replace. – jduncanator Sep 09 '13 at 07:45
  • Yes, but is more efficient if you want to remove at end – Alessandro D'Andria Sep 09 '13 at 07:49
  • @AlessandroD'Andria TrimEnd ends up using SubString, which indeed could be faster, however there is no promise the '\0' will ALWAYS be at the end, it could "magically" end up in the middle somehow (although I don't think this is possible), and the issue isn't with the methods performance, its with memory/GC. – jduncanator Sep 09 '13 at 07:50
  • What is the typical length of your strings? – Naruil Sep 09 '13 at 08:09
  • @Naruil Probably 50-100 characters – jduncanator Sep 09 '13 at 08:11

1 Answers1

2

You could "cheat" and work at the Encoder level...

public class UTF8NoZero : UTF8Encoding
{
    public override Decoder GetDecoder()
    {
        return new MyDecoder();
    }
}

public class MyDecoder : Decoder
{
    public Encoding UTF8 = new UTF8Encoding();

    public override int GetCharCount(byte[] bytes, int index, int count)
    {
        return UTF8.GetCharCount(bytes, index, count);
    }

    public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
    {
        int count2 = UTF8.GetChars(bytes, byteIndex, byteCount, chars, charIndex);
        int i, j;

        for (i = charIndex, j = charIndex; i < charIndex + count2; i++)
        {
            if (chars[i] != '\0')
            {
                chars[j] = chars[i];
                j++;
            }
        }

        for (int k = j; k < charIndex + count2; k++)
        {
            chars[k] = '\0';
        }

        return count2 + (i - j);
    }
}

Note that this cheat is based on the fact that StreamReader.ReadLineAsync uses only the GetChars(). We remove the '\0' in the temporary buffer char[] buffer used by StreamReader.ReadLineAsync.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • Like said, we use a large switch-case block, not nested if's. The fact we need to trim the '\0' off is because it would be a part of one of the parameters somewhere a long the line, not part of the Packet Type. – jduncanator Sep 09 '13 at 07:16
  • smart, very smart :) This could work! After browsing mscorlib source code, it looks like a pointer based solution could also be used ;) I'll Let you know what this does to performance :) Might I ask what the second for loop is for? – jduncanator Sep 09 '13 at 07:47