2

What is the best way of splitting up a string by capital letters in C#?

Example:

HelloStackOverflow Users.How Are you doing?

Expected result:

Hello Stack Overflow Users. How are you doing?

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
burnt1ce
  • 14,387
  • 33
  • 102
  • 162
  • How about you take a stab at it first, then come back with any questions you have with your implementation? – CanSpice Mar 30 '11 at 21:25
  • Similar question : http://stackoverflow.com/questions/1097901/regular-expression-split-string-by-capital-letter-but-ignore-tla – Rion Williams Mar 30 '11 at 21:26
  • 2
    You aren't trying to split the string; you're trying to insert spaces. – SLaks Mar 30 '11 at 21:27
  • You're right.. im trying to insert spaces, not split the string into an array. – burnt1ce Mar 30 '11 at 21:33
  • Checkout this [question](http://stackoverflow.com/questions/272633/add-spaces-before-capital-letters), it talks about performance implications of using a regex and suggests alternative implementations. – Vinay B R Mar 30 '11 at 21:35

3 Answers3

2

You can use a regex:

static readonly Regex splitter = new Regex(@"\s+|(?=\s*[A-Z]+)|(?<=[,.?!])");

var spacedOut = splitter.Replace(str, " ");

This uses a lookahead to match the spot before a capital letter (with \s* to swallow the whitespace).
It uses a lookbehind to match the spot after punctuation.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • not a Regex expert. Will this handle cases where acronyms are used? Such as the following sentence: "IAmFromTheUSA" ??? OP might want the case handled as well so that it doesn't return "I Am From The U S A". – Matthew Cox Mar 30 '11 at 21:30
  • 1
    The regex keeps changing :-) The current version works. However, you do end up with some double spaces in strings like `"How Are You"`. – Elian Ebbing Mar 30 '11 at 21:37
  • 1
    No; it won't. That would be more complicated, requiring a negative lookbehind. – SLaks Mar 30 '11 at 21:38
2

It depends how you define "best".

Unless you want a trivial implementation (blindly insert a space in front of every uppercase letter), I'd avoid regex and just write the few lines of code that do precisely what I need - create a destination StringBuilder, do a foreach through the characters of the string, copying characters across and inserting extra spaces when appropriate - you'll just need to keep a state variable to know if the previous character was uppercase. This will make it easy to handle all the possible special cases (first character is uppercase, acronyms, characters following punctuation or whitespace, single words like "A", culture-sensitive handling, etc).

Why wouldn't I use regex?

  • Firstly, if you want to handle all the special cases well, you'll probably need quite advaned regex skills, and the result will be an undecipherable "magic string" (difficult to read/maintain, as perfectly demonstrated by @Slaks IMHO - can you read and understand his regex in under 10 seconds?). A simple loop will be much easier to write, test, debug, read and upgrade unless you (and anyone else who might have to read/maintain your code in future) have been doing regexes for years.

  • Secondly, a loop through the characters is very simple. The regex will almost certainly be slower due to the higher level of generalisation it provides. This may or may not be an issue for you, but efficiency could be a significant factor when definiing "best".

  • Thirdly, I'm an old dog and I don't see much point in using clever new tricks to solve problems that a simple for loop can handle :-) ... I often see programmers using "cool" obfuscated LINQ queries and Regexes in place of a simple 2-or-3-line loop, and it makes me think of the old adage "to a man with a hammer, everything looks like a nail". Regex, like all tools, has its place. And I'm not convinced this justifies anything that complex.

Jason Williams
  • 56,972
  • 11
  • 108
  • 137
0

I'm an oldschool guy, I would write it using StringBuilder because I do not speak regexish:

var sb = new StringBuilder(input.Length);
int nextIndexToAdd = 0;
for (int i = 1; i < input.Length;i++ )
    if (char.IsUpper(input[i])
        && !char.IsWhiteSpace(input[i - 1])
        && (!char.IsUpper(input[i - 1]) || (i < input.Length - 1 && !char.IsUpper(input[i + 1]))))
        {
            sb.Append(input.Substring(nextIndexToAdd, i - nextIndexToAdd));
            sb.Append(" ");
            nextIndexToAdd = i;
        }
sb.Append(input.Substring(nextIndexToAdd));
string result = sb.ToString();

This handles both IAmFromUSA and HelloStack...

Snowbear
  • 16,924
  • 3
  • 43
  • 67