with the aims of
- a) Creating a function which optimised performance
- b) Have my own take on CamelCase in which capitalised acronyms were not separated (I fully accept this is not the standard definition of camel or pascal case, but it is not an uncommon usage) : "TestTLAContainingCamelCase" becomes "Test TLA Containing Camel Case" (TLA = Three Letter Acronym)
I therefore created the following (non regex, verbose, but performance oriented) function
public static string ToSeparateWords(this string value)
{
if (value==null){return null;}
if(value.Length <=1){return value;}
char[] inChars = value.ToCharArray();
List<int> uCWithAnyLC = new List<int>();
int i = 0;
while (i < inChars.Length && char.IsUpper(inChars[i])) { ++i; }
for (; i < inChars.Length; i++)
{
if (char.IsUpper(inChars[i]))
{
uCWithAnyLC.Add(i);
if (++i < inChars.Length && char.IsUpper(inChars[i]))
{
while (++i < inChars.Length)
{
if (!char.IsUpper(inChars[i]))
{
uCWithAnyLC.Add(i - 1);
break;
}
}
}
}
}
char[] outChars = new char[inChars.Length + uCWithAnyLC.Count];
int lastIndex = 0;
for (i=0;i<uCWithAnyLC.Count;i++)
{
int currentIndex = uCWithAnyLC[i];
Array.Copy(inChars, lastIndex, outChars, lastIndex + i, currentIndex - lastIndex);
outChars[currentIndex + i] = ' ';
lastIndex = currentIndex;
}
int lastPos = lastIndex + uCWithAnyLC.Count;
Array.Copy(inChars, lastIndex, outChars, lastPos, outChars.Length - lastPos);
return new string(outChars);
}
What was most surprising was the performance tests. using 1 000 000 iterations per function
regex pattern used = "([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))"
test string = "TestTLAContainingCamelCase":
static regex: 13 302ms
Regex instance: 12 398ms
compiled regex: 12 663ms
brent(above): 345ms
AndyRose: 1 764ms
DanTao: 995ms
the Regex instance method was only slightly faster than the static method, even over a million iterations (and I can't see the benefit of using the RegexOptions.Compiled flag), and Dan Tao's very succinct code was almost as fast as my much less clear code!