0

I need to split a CamelCase string into an array of words based on the case of the letters. The rules for dividing the string are as follows:

  • Break the string in all places where a lowercase letter is followed by an uppercase letter, and break before the uppercase letter.
    • e.g.: aB -> { "a", "B" }
    • e.g.: helloWorld -> { "hello", "World" }
  • Break the string in all places where an uppercase letter is followed by a lowercase letter, and break before the uppercase letter.
    • e.g.: ABc -> { "A", "Bc" }
    • e.g.: HELLOWorld -> { "HELLO", "World" }

Some edge cases deserve examples of expected output:

  • FYYear -> { "FY", "Year" }
  • CostCenter -> { "Cost", "Center" }
  • cosTCenter -> { "cos", "T", "Center" }
  • CostcenteR -> { "Costcente", "R" }
  • COSTCENTER -> { "COSTCENTER" }

I've tried using a regular expression as shown in the code below:

updateCaption = string.Join(" ", Regex.Split(updateCaption, @"(?<!^)(?=[A-Z])"));

But this doesn't work.

Wyck
  • 10,311
  • 6
  • 39
  • 60
tvb108108
  • 398
  • 3
  • 19
  • What if you had something like `cosTCenter`? – Sach Aug 12 '19 at 21:26
  • Or `CostcenteR` – Sach Aug 12 '19 at 21:27
  • Or `COSTCENTER` – Sach Aug 12 '19 at 21:28
  • 1
    What I'm trying to say is that your problem statement may not be well defined. – Sach Aug 12 '19 at 21:28
  • 1
    What makes FYY required to split on FY and Y? Is it that there are three CAPS together? – MichaelD Aug 12 '19 at 21:28
  • This question does not seem too broad to me. I'm surprised this is on hold. There's a specific problem definition and expected input and output. Alas, something needs to be done, so I'll take a shot at rewriting it to be more concise. Hopefully still in alignment with @tvb108108's intent. – Wyck Aug 13 '19 at 01:11
  • @Wyck Thank you Wyck for your comments. Thought I provided enough information of what I needed to do. Your updates look great and is worded a lot better. – tvb108108 Aug 13 '19 at 13:18
  • @wyck all yours now... – Flexo Aug 14 '19 at 00:52
  • Probably worth referencing some great solutions in other languages: [JavaScript solution](https://stackoverflow.com/a/18379358/1563833) and [Java solution](https://stackoverflow.com/a/7594052/1563833). You can craft a C# implementation that uses the same approach. – Wyck Aug 14 '19 at 03:24

2 Answers2

3

This RegEx should do the trick:

private string ToUppercase(string input) {
    var regex = new Regex(@"(?<=[A-Z])(?=[A-Z][a-z])|(?<=[^A-Z])(?=[A-Z])");
    return regex.Replace(input, " ");
}

I copied the formatting from https://regex101.com/r/ahah3D/2 for further explanation:

regex formatted

There are two matching groups considered here. The first positive lookbehind looks for any uppercase letter followed by any (uppercase or lowercase) letter. The second one tests for your standard case i. e. a lowercase letter followed by an uppercase letter.

Let me know if that solves your question.

div
  • 905
  • 1
  • 7
  • 20
1

Here's my approach:

static IEnumerable<string> SplitCamelCase(string input)
{
   return Regex.Split(input, @"([A-Z]?[a-z]+)").Where(str => !string.IsNullOrEmpty(str));
}

It works by splitting the string using "an uppercase letter followed by one or more lowercase letters" (or just one or more lowercase letters) as a delimiter. string.Split will include the delimiters in the result array if they are captured in parentheses (and they are, in my example). And this leaves only the spans of capital letters (all but the last) occurring between delimiters, which string.Split will include in the array naturally. It does produce superfluous empty strings in some cases, but they can be filtered out; I did so with a .Where clause.

It's not bad. I only wish there were a nicer way to eliminate the empty strings more easily.

By the way, I elected to return IEnumerable<string> because I feel like that format is more reusable. But you can always .ToArray() the result if you prefer an array, or the result can be joined with spaces using string.Join(" ", result) to form your corrected string.

Here's a complete demonstration:

class Program
{
    static IEnumerable<string> SplitCamelCase(string input)
    {
        return Regex.Split(input, @"([A-Z]?[a-z]+)").Where(str => !string.IsNullOrEmpty(str));
    }

    static void Main(string[] args)
    {
        string[] examples = new string[] {
            "FYYear",
            "CostCenter",
            "cosTCenter",
            "CostcenteR",
            "COSTCENTER"
        };
        foreach (string str in examples) {
            Console.WriteLine("{0, 10} -> {1}", str, String.Join(" ", SplitCamelCase(str)));
        }
    }
}

Output:

    FYYear -> FY Year
CostCenter -> Cost Center
cosTCenter -> cos T Center
CostcenteR -> Costcente R
COSTCENTER -> COSTCENTER
Wyck
  • 10,311
  • 6
  • 39
  • 60