220

Given the string "ThisStringHasNoSpacesButItDoesHaveCapitals" what is the best way to add spaces before the capital letters. So the end string would be "This String Has No Spaces But It Does Have Capitals"

Here is my attempt with a RegEx

System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
Blair Conrad
  • 233,004
  • 25
  • 132
  • 111
Bob
  • 97,670
  • 29
  • 122
  • 130
  • 3
    Do you have a particular complaint about the approach you've taken? That might help us improve upon your method. – Blair Conrad Nov 07 '08 at 16:36
  • If the regex works, then I'd stick with that. Regex is optamized for string manipulation. – Michael Meadows Nov 07 '08 at 16:39
  • I am just curious is there is a better or perhaps even a built in approach. I'd even be curious to see other approachs with other languages. – Bob Nov 07 '08 at 16:51
  • 5
    Your code simply didn't work because the modified string is the return value of the 'Replace' function. With this code line: 'System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0").Trim();' it would work perfectly. (Just commenting because I stumbled over this post and nobody really saw, what was wrong with your code.) – Mattu475 Apr 22 '15 at 13:46
  • Regex.Replace("ThisStringHasNoSpacesButItDoesHaveCapitals", @"\B[A-Z]", m => " " + m); – saquib adil Feb 25 '19 at 13:53

32 Answers32

222

The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)

This function

string AddSpacesToSentence(string text, bool preserveAcronyms)
{
        if (string.IsNullOrWhiteSpace(text))
           return string.Empty;
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]))
                if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
                    (preserveAcronyms && char.IsUpper(text[i - 1]) && 
                     i < text.Length - 1 && !char.IsUpper(text[i + 1])))
                    newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}

Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).

It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.

Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).

On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.

Update 2 Will it take Acronyms into account? It will now! The logic of the if statment is fairly obscure, as you can see expanding it to this ...

if (char.IsUpper(text[i]))
    if (char.IsUpper(text[i - 1]))
        if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
            newText.Append(' ');
        else ;
    else if (text[i - 1] != ' ')
        newText.Append(' ');

... doesn't help at all!

Here's the original simple method that doesn't worry about Acronyms

string AddSpacesToSentence(string text)
{
        if (string.IsNullOrWhiteSpace(text))
           return "";
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]) && text[i - 1] != ' ')
                newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}
starball
  • 20,030
  • 7
  • 43
  • 238
Binary Worrier
  • 50,774
  • 20
  • 136
  • 184
  • 10
    if (char.IsUpper (text [i]) && text[i - 1] != ' ') If you re-run the code above it keeps adding spaces, this will stop spaces being added if there is a space before the capital letter. – Paul Talbot Oct 26 '10 at 08:32
  • I am not sure so I thought I would ask, does this method handle acronyms as described in Martin Brown's answer "DriveIsSCSICompatible" would ideally become "Drive Is SCSI Compatible" – Paul C Jul 23 '13 at 15:39
  • That made it 1 character by replacing the contents of your for statement with the newly updated if statements, I may be doing something wrong? – Paul C Jul 23 '13 at 16:36
  • I think so, I just pasted the full function into a test project and it worked at treat, sorry. – Binary Worrier Jul 24 '13 at 07:06
  • with this solution, "407 ETR Customer Service" is converted to "407 ET R Customer Service" and "PAR-MED" is converted to "PA R-ME D", both incorrect – Julien Nov 26 '15 at 16:40
  • "2ND" gets changed to "2 ND", "SPECTRE-DVD" gets changed to "SPECTRE- DVD", both seem incorrect to me – Julien Aug 30 '16 at 16:11
  • Expensive? I've never had a regular expression cause a performance issue of any kind. In many cases, they are faster than the long form algorithm. – Jordan May 23 '18 at 15:35
  • 1
    Adding a check for char.IsLetter(text[i + 1]) helps with acronyms with special characters and digits (i.e. ABC_DEF wont get split as AB C_DEF). – HeXanon Oct 03 '18 at 08:33
  • I am getting space before some acronyms. I will suggest to add text[i + 1] != ' ' in the last. if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) || (preserveAcronyms && char.IsUpper(text[i - 1]) && i < text.Length - 1 && !char.IsUpper(text[i + 1]) && text[i + 1] != ' ')) – Kishan Choudhary Feb 21 '20 at 09:50
  • 1
    I'm not sure the acronyms part is correct when its turned OFF. I just ran a test "ASentenceABC" expands to "ASentence A B C". Should be "A Sentence A B C" – Tim Rutter Apr 12 '20 at 07:25
  • You should make it a string extension `this string text` – Kellen Stuart Mar 05 '21 at 18:04
  • I think this is what you're after as with the acronym part you just care if the previous value wasn't upper I think `var previousWasntUpper = previous != ' ' && !char.IsUpper(previous); if (preserveAcronyms || previousWasntUpper)` – Tim Apr 21 '21 at 04:33
  • In `AddSpacesToSentence(string text, bool preserveAcronyms)`, once you determine `text[i]` is uppercase, the first condition works better as `(text[i - 1] != ' ' && (!preserveAcronyms || !char.IsUpper(text[i - 1])))` so the word A gets handled correctly, e.g. `YouAreAGenius` becomes `You Are A Genius` rather than `You Are AGenius`. – gknicker Jul 19 '23 at 15:35
178

Your solution has an issue in that it puts a space before the first letter T so you get

" This String..." instead of "This String..."

To get around this look for the lower case letter preceding it as well and then insert the space in the middle:

newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");

Edit 1:

If you use @"(\p{Ll})(\p{Lu})" it will pick up accented characters as well.

Edit 2:

If your strings can contain acronyms you may want to use this:

newValue = Regex.Replace(value, @"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");

So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"

Martin Brown
  • 24,692
  • 14
  • 77
  • 122
102

Didn't test performance, but here in one line with linq:

var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');
Robert Levy
  • 28,747
  • 6
  • 62
  • 94
EtienneT
  • 5,045
  • 6
  • 36
  • 39
24

I know this is an old one, but this is an extension I use when I need to do this:

public static class Extensions
{
    public static string ToSentence( this string Input )
    {
        return new string(Input.SelectMany((c, i) => i > 0 && char.IsUpper(c) ? new[] { ' ', c } : new[] { c }).ToArray());
    }
}

This will allow you to use MyCasedString.ToSentence()

Andreas Zita
  • 7,232
  • 6
  • 54
  • 115
Rob Hardy
  • 1,821
  • 15
  • 15
  • I like the idea of this as an extension method, if you add `TrimStart(' ')` it will remove the leading space. – user1069816 Jun 22 '15 at 10:05
  • 1
    Thanks @user1069816. I have changed the extension to use the overload of `SelectMany` which includes an index, this way it avoids the first letter and the unnecessary potential overhead of an additional call to `TrimStart(' ')`. Rob. – Rob Hardy Jun 25 '15 at 13:45
  • Does not handle acronyms. HasCICDHidden => Has C I C D Hidden – amr ras Dec 17 '22 at 03:37
11

I set out to make a simple extension method based on Binary Worrier's code which will handle acronyms properly, and is repeatable (won't mangle already spaced words). Here is my result.

public static string UnPascalCase(this string text)
{
    if (string.IsNullOrWhiteSpace(text))
        return "";
    var newText = new StringBuilder(text.Length * 2);
    newText.Append(text[0]);
    for (int i = 1; i < text.Length; i++)
    {
        var currentUpper = char.IsUpper(text[i]);
        var prevUpper = char.IsUpper(text[i - 1]);
        var nextUpper = (text.Length > i + 1) ? char.IsUpper(text[i + 1]) || char.IsWhiteSpace(text[i + 1]): prevUpper;
        var spaceExists = char.IsWhiteSpace(text[i - 1]);
        if (currentUpper && !spaceExists && (!nextUpper || !prevUpper))
                newText.Append(' ');
        newText.Append(text[i]);
    }
    return newText.ToString();
}

Here are the unit test cases this function passes. I added most of tchrist's suggested cases to this list. The three of those it doesn't pass (two are just Roman numerals) are commented out:

Assert.AreEqual("For You And I", "ForYouAndI".UnPascalCase());
Assert.AreEqual("For You And The FBI", "ForYouAndTheFBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "AManAPlanACanalPanama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNSServer".UnPascalCase());
Assert.AreEqual("For You And I", "For You And I".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "MountMᶜKinleyNationalPark".UnPascalCase());
Assert.AreEqual("El Álamo Tejano", "ElÁlamoTejano".UnPascalCase());
Assert.AreEqual("The Ævar Arnfjörð Bjarmason", "TheÆvarArnfjörðBjarmason".UnPascalCase());
Assert.AreEqual("Il Caffè Macchiato", "IlCaffèMacchiato".UnPascalCase());
//Assert.AreEqual("Mister Dženan Ljubović", "MisterDženanLjubović".UnPascalCase());
//Assert.AreEqual("Ole King Henry Ⅷ", "OleKingHenryⅧ".UnPascalCase());
//Assert.AreEqual("Carlos Ⅴº El Emperador", "CarlosⅤºElEmperador".UnPascalCase());
Assert.AreEqual("For You And The FBI", "For You And The FBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "A Man A Plan A Canal Panama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNS Server".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "Mount Mᶜ Kinley National Park".UnPascalCase());
johnny 5
  • 19,893
  • 50
  • 121
  • 195
Kevin Stricker
  • 17,178
  • 5
  • 45
  • 71
9

Welcome to Unicode

All these solutions are essentially wrong for modern text. You need to use something that understands case. Since Bob asked for other languages, I'll give a couple for Perl.

I provide four solutions, ranging from worst to best. Only the best one is always right. The others have problems. Here is a test run to show you what works and what doesn’t, and where. I’ve used underscores so that you can see where the spaces have been put, and I’ve marked as wrong anything that is, well, wrong.

Testing TheLoneRanger
               Worst:    The_Lone_Ranger
               Ok:       The_Lone_Ranger
               Better:   The_Lone_Ranger
               Best:     The_Lone_Ranger
Testing MountMᶜKinleyNationalPark
     [WRONG]   Worst:    Mount_MᶜKinley_National_Park
     [WRONG]   Ok:       Mount_MᶜKinley_National_Park
     [WRONG]   Better:   Mount_MᶜKinley_National_Park
               Best:     Mount_Mᶜ_Kinley_National_Park
Testing ElÁlamoTejano
     [WRONG]   Worst:    ElÁlamo_Tejano
               Ok:       El_Álamo_Tejano
               Better:   El_Álamo_Tejano
               Best:     El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
     [WRONG]   Worst:    TheÆvar_ArnfjörðBjarmason
               Ok:       The_Ævar_Arnfjörð_Bjarmason
               Better:   The_Ævar_Arnfjörð_Bjarmason
               Best:     The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
     [WRONG]   Worst:    Il_CaffèMacchiato
               Ok:       Il_Caffè_Macchiato
               Better:   Il_Caffè_Macchiato
               Best:     Il_Caffè_Macchiato
Testing MisterDženanLjubović
     [WRONG]   Worst:    MisterDženanLjubović
     [WRONG]   Ok:       MisterDženanLjubović
               Better:   Mister_Dženan_Ljubović
               Best:     Mister_Dženan_Ljubović
Testing OleKingHenryⅧ
     [WRONG]   Worst:    Ole_King_HenryⅧ
     [WRONG]   Ok:       Ole_King_HenryⅧ
     [WRONG]   Better:   Ole_King_HenryⅧ
               Best:     Ole_King_Henry_Ⅷ
Testing CarlosⅤºElEmperador
     [WRONG]   Worst:    CarlosⅤºEl_Emperador
     [WRONG]   Ok:       CarlosⅤº_El_Emperador
     [WRONG]   Better:   CarlosⅤº_El_Emperador
               Best:     Carlos_Ⅴº_El_Emperador

BTW, almost everyone here has selected the first way, the one marked "Worst". A few have selected the second way, marked "OK". But no one else before me has shown you how to do either the "Better" or the "Best" approach.

Here is the test program with its four methods:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

# First I'll prove these are fine variable names:
my (
    $TheLoneRanger              ,
    $MountMᶜKinleyNationalPark  ,
    $ElÁlamoTejano              ,
    $TheÆvarArnfjörðBjarmason   ,
    $IlCaffèMacchiato           ,
    $MisterDženanLjubović         ,
    $OleKingHenryⅧ              ,
    $CarlosⅤºElEmperador        ,
);

# Now I'll load up some string with those values in them:
my @strings = qw{
    TheLoneRanger
    MountMᶜKinleyNationalPark
    ElÁlamoTejano
    TheÆvarArnfjörðBjarmason
    IlCaffèMacchiato
    MisterDženanLjubović
    OleKingHenryⅧ
    CarlosⅤºElEmperador
};

my($new, $best, $ok);
my $mask = "  %10s   %-8s  %s\n";

for my $old (@strings) {
    print "Testing $old\n";
    ($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;

    ($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Worst:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Ok:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Better:", $new;

    ($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Best:", $new;
}

When you can score the same as the "Best" on this dataset, you’ll know you’ve done it correctly. Until then, you haven’t. No one else here has done better than "Ok", and most have done it "Worst". I look forward to seeing someone post the correct ℂ♯ code.

I notice that StackOverflow’s highlighting code is miserably stoopid again. They’re making all the same old lame as (most but not all) of the rest of the poor approaches mentioned here have made. Isn’t it long past time to put ASCII to rest? It doens’t make sense anymore, and pretending it’s all you have is simply wrong. It makes for bad code.

tchrist
  • 78,834
  • 30
  • 123
  • 180
  • your 'Best' answer seems the closest so far, but it doesn't seem like it accounts for leading punctuation or other leading non-lowercase letters. This seems to work best for me (in java): replaceAll("(?<=[^^\\p{javaUpperCase}])(?=[\\p{javaUpperCase}])"," "); – Randyaa Jun 30 '11 at 14:18
  • Hmm. I'm not sure roman numerals should really count as uppercase in this example. The letter modifer example definitely shouldn't be counted. If you go to McDonalds.com you will see it is written without a space. – Martin Brown Mar 12 '12 at 11:47
  • It should also be noted that you will never get this to be perfect. For example I would like to see an example that sorts out "AlexandervonHumboldt", which should end up as "Alexander von Humboldt". Then there are of course languages that don't have the destinction of Capital and Lowercase. – Martin Brown Mar 12 '12 at 12:07
5

This Regex places a space character in front of every capital letter:

using System.Text.RegularExpressions;

const string myStringWithoutSpaces = "ThisIsAStringWithoutSpaces";
var myStringWithSpaces = Regex.Replace(myStringWithoutSpaces, "([A-Z])([a-z]*)", " $1$2");

Mind the space in front if "$1$2", this is what will get it done.

This is the outcome:

"This Is A String Without Spaces"
4

Inspired from @MartinBrown, Two Lines of Simple Regex, which will resolve your name, including Acyronyms anywhere in the string.

public string ResolveName(string name)
{
   var tmpDisplay = Regex.Replace(name, "([^A-Z ])([A-Z])", "$1 $2");
   return Regex.Replace(tmpDisplay, "([A-Z]+)([A-Z][^A-Z$])", "$1 $2").Trim();
}
johnny 5
  • 19,893
  • 50
  • 121
  • 195
  • I like this solution. It is short and fast. However, similar to other solutions, It fails with string "RegularOTs". Every solution I tried here returns "Regular O Ts" – Patee Gutee Jan 29 '18 at 19:30
  • @PateeGutee the OP wanted space before capitols, he didn’t mention abbreviations, we have a fix for that in production cod – johnny 5 Jan 29 '18 at 19:35
  • Can you show the fix? I have strings like this in my data and it is giving me incorrect result. Thanks. – Patee Gutee Jan 29 '18 at 19:43
  • @PateeGutee Sorry, I misread what you wanted. Pluralization is a different issues, `RegularOTs' what are you expecting to happen "Regular OTs" or "Regular OT s" – johnny 5 Jan 30 '18 at 02:04
  • I'm expecting something like the following: "RegularOTs" -> "Regular OTs"... "BrowseFAQsPartly" -> "Browse FAQs Partly" – Patee Gutee Jan 31 '18 at 14:53
  • @PateeGutee Are you expecting this to work with only `s`, what about 'es'? Basically what rules are you expecting, if there is an Acronym trailing by an 's' or if there is an Acryonym trail by a single lower case? – johnny 5 Jan 31 '18 at 17:59
  • 1
    @PateeGutee I've updated my answer for you, I believe that should work – johnny 5 Jan 31 '18 at 19:35
  • Your updated solution handles plural acronyms pretty well. However, it fails with string "G7799CertifiedFRs". I am expecting "G7799 Certified FRs" but your updated solution only gave "Certified FRs" missing "G7799". Your original solution gave "G7799 Certified F Rs". – Patee Gutee Feb 01 '18 at 17:35
  • Try replacing [A-Z] with [A-Z1-9] – johnny 5 Feb 01 '18 at 18:25
  • Getting there. "Y2000CertifiedTRs" -> "Y2000Certified TRs", "Y2000TRs" -> "Y2000TRs" which are both missing single space after "Y2000". – Patee Gutee Feb 01 '18 at 20:23
  • The following tests are successful: "AdvanceABCs" -> "Advance ABCs", "Advance123s" -> "Advance 123s", "ABCsAdvance -> "ABCs Advance", "123sAdvance" -> "123s Advance" – Patee Gutee Feb 01 '18 at 20:23
  • I’ve removed the edit since my solution was incomplete, just wondering why you’re using regex for this? Wouldn’t it be a whole lot quicker to just manually write a function that splits after the last capitol letter number or s? – johnny 5 Feb 01 '18 at 20:33
  • @PateeGutee, the real issue with regex is that it's time consuming to understand, and hard to maintain. Imagine 1 year from now you need to add something else like support for pluralization for `es` or something, and it also enforces a standard that the people who are working on the code should know regex. It seems you have alot of tests for this already why not implement some TDD, and outline a set of definitive rules, Such as handling accented characters etc... – johnny 5 Feb 01 '18 at 21:55
4

Make sure you aren't putting spaces at the beginning of the string, but you are putting them between consecutive capitals. Some of the answers here don't address one or both of those points. There are other ways than regex, but if you prefer to use that, try this:

Regex.Replace(value, @"\B[A-Z]", " $0")

The \B is a negated \b, so it represents a non-word-boundary. It means the pattern matches "Y" in XYzabc, but not in Yzabc or X Yzabc. As a little bonus, you can use this on a string with spaces in it and it won't double them.

Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
4

Binary Worrier, I have used your suggested code, and it is rather good, I have just one minor addition to it:

public static string AddSpacesToSentence(string text)
{
    if (string.IsNullOrEmpty(text))
        return "";
    StringBuilder newText = new StringBuilder(text.Length * 2);
    newText.Append(text[0]);
            for (int i = 1; i < result.Length; i++)
            {
                if (char.IsUpper(result[i]) && !char.IsUpper(result[i - 1]))
                {
                    newText.Append(' ');
                }
                else if (i < result.Length)
                {
                    if (char.IsUpper(result[i]) && !char.IsUpper(result[i + 1]))
                        newText.Append(' ');

                }
                newText.Append(result[i]);
            }
    return newText.ToString();
}

I have added a condition !char.IsUpper(text[i - 1]). This fixed a bug that would cause something like 'AverageNOX' to be turned into 'Average N O X', which is obviously wrong, as it should read 'Average NOX'.

Sadly this still has the bug that if you have the text 'FromAStart', you would get 'From AStart' out.

Any thoughts on fixing this?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Maybe something like this would work: char.IsUpper(text[i]) && (char.IsLower(text[i - 1]) || (char.IsLower(text[i+1])) – Martin Brown Oct 22 '09 at 16:52
  • 1
    This is the correct one: `if (char.IsUpper(text[i]) && !(char.IsUpper(text[i - 1]) && char.IsUpper(text[i + 1])))` Test result: "From Start", "From THE Start", "From A Start" but you need `i < text.Length - 1` in the for loop condition to ignore the last character and prevent out of range exception. – CallMeLaNN Mar 18 '11 at 08:10
  • Oh it just the same. !(a && b) and (!a || !b) because lower = !upper. – CallMeLaNN Mar 18 '11 at 08:13
  • what is the result? – git test Jun 11 '21 at 11:47
3

Here's mine:

private string SplitCamelCase(string s) 
{ 
    Regex upperCaseRegex = new Regex(@"[A-Z]{1}[a-z]*"); 
    MatchCollection matches = upperCaseRegex.Matches(s); 
    List<string> words = new List<string>(); 
    foreach (Match match in matches) 
    { 
        words.Add(match.Value); 
    } 
    return String.Join(" ", words.ToArray()); 
}
George Stocker
  • 57,289
  • 29
  • 176
  • 237
Cory Foy
  • 7,202
  • 4
  • 31
  • 34
2

Here is how you could do it in SQL

create  FUNCTION dbo.PascalCaseWithSpace(@pInput AS VARCHAR(MAX)) RETURNS VARCHAR(MAX)
BEGIN
    declare @output varchar(8000)

set @output = ''


Declare @vInputLength        INT
Declare @vIndex              INT
Declare @vCount              INT
Declare @PrevLetter varchar(50)
SET @PrevLetter = ''

SET @vCount = 0
SET @vIndex = 1
SET @vInputLength = LEN(@pInput)

WHILE @vIndex <= @vInputLength
BEGIN
    IF ASCII(SUBSTRING(@pInput, @vIndex, 1)) = ASCII(Upper(SUBSTRING(@pInput, @vIndex, 1)))
       begin 

        if(@PrevLetter != '' and ASCII(@PrevLetter) = ASCII(Lower(@PrevLetter)))
            SET @output = @output + ' ' + SUBSTRING(@pInput, @vIndex, 1)
            else
            SET @output = @output +  SUBSTRING(@pInput, @vIndex, 1) 

        end
    else
        begin
        SET @output = @output +  SUBSTRING(@pInput, @vIndex, 1) 

        end

set @PrevLetter = SUBSTRING(@pInput, @vIndex, 1) 

    SET @vIndex = @vIndex + 1
END


return @output
END
KCITGuy
  • 21
  • 1
2

What you have works perfectly. Just remember to reassign value to the return value of this function.

value = System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0");
Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
1
static string AddSpacesToColumnName(string columnCaption)
    {
        if (string.IsNullOrWhiteSpace(columnCaption))
            return "";
        StringBuilder newCaption = new StringBuilder(columnCaption.Length * 2);
        newCaption.Append(columnCaption[0]);
        int pos = 1;
        for (pos = 1; pos < columnCaption.Length-1; pos++)
        {               
            if (char.IsUpper(columnCaption[pos]) && !(char.IsUpper(columnCaption[pos - 1]) && char.IsUpper(columnCaption[pos + 1])))
                newCaption.Append(' ');
            newCaption.Append(columnCaption[pos]);
        }
        newCaption.Append(columnCaption[pos]);
        return newCaption.ToString();
    }
cyril
  • 11
  • 2
1

In Ruby, via Regexp:

"FooBarBaz".gsub(/(?!^)(?=[A-Z])/, ' ') # => "Foo Bar Baz"
Artem
  • 11
  • 1
  • 1
    Oops, sorry. I've missed that it's C#-specific question and posted here Ruby answer :( – Artem Jul 26 '12 at 20:29
1

I took Kevin Strikers excellent solution and converted to VB. Since i'm locked into .NET 3.5, i also had to write IsNullOrWhiteSpace. This passes all of his tests.

<Extension()>
Public Function IsNullOrWhiteSpace(value As String) As Boolean
    If value Is Nothing Then
        Return True
    End If
    For i As Integer = 0 To value.Length - 1
        If Not Char.IsWhiteSpace(value(i)) Then
            Return False
        End If
    Next
    Return True
End Function

<Extension()>
Public Function UnPascalCase(text As String) As String
    If text.IsNullOrWhiteSpace Then
        Return String.Empty
    End If

    Dim newText = New StringBuilder()
    newText.Append(text(0))
    For i As Integer = 1 To text.Length - 1
        Dim currentUpper = Char.IsUpper(text(i))
        Dim prevUpper = Char.IsUpper(text(i - 1))
        Dim nextUpper = If(text.Length > i + 1, Char.IsUpper(text(i + 1)) Or Char.IsWhiteSpace(text(i + 1)), prevUpper)
        Dim spaceExists = Char.IsWhiteSpace(text(i - 1))
        If (currentUpper And Not spaceExists And (Not nextUpper Or Not prevUpper)) Then
            newText.Append(" ")
        End If
        newText.Append(text(i))
    Next
    Return newText.ToString()
End Function
Brad Irby
  • 2,397
  • 1
  • 16
  • 26
1

The question is a bit old but nowadays there is a nice library on Nuget that does exactly this as well as many other conversions to human readable text.

Check out Humanizer on GitHub or Nuget.

Example

"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Underscored_input_String_is_turned_INTO_sentence".Humanize() => "Underscored input String is turned INTO sentence"

// acronyms are left intact
"HTML".Humanize() => "HTML"
Jonas Pegerfalk
  • 9,016
  • 9
  • 29
  • 29
  • Just tried that and the first link is now broken. NuGet works, but the package doesn't compile in my solution. A nice idea, if it worked. – philw Nov 27 '14 at 13:50
1

Seems like a good opportunity for Aggregate. This is designed to be readable, not necessarily especially fast.

someString
.Aggregate(
   new StringBuilder(),
   (str, ch) => {
      if (char.IsUpper(ch) && str.Length > 0)
         str.Append(" ");
      str.Append(ch);
      return str;
   }
).ToString();
Dave Cousineau
  • 12,154
  • 8
  • 64
  • 80
1

Found a lot of these answers to be rather obtuse but I haven't fully tested my solution, but it works for what I need, should handle acronyms, and is much more compact/readable than the others IMO:

private string CamelCaseToSpaces(string s)
    {
        if (string.IsNullOrEmpty(s)) return string.Empty;

        StringBuilder stringBuilder = new StringBuilder();
        for (int i = 0; i < s.Length; i++)
        {
            stringBuilder.Append(s[i]);

            int nextChar = i + 1;
            if (nextChar < s.Length && char.IsUpper(s[nextChar]) && !char.IsUpper(s[i]))
            {
                stringBuilder.Append(" ");
            }
        }

        return stringBuilder.ToString();
    }
Adam Short
  • 75
  • 1
  • 6
1
replaceAll("(?<=[^^\\p{Uppercase}])(?=[\\p{Uppercase}])"," ");
Randyaa
  • 2,935
  • 4
  • 36
  • 49
0

Here's my solution, based on Binary Worriers suggestion and building in Richard Priddys' comments, but also taking into account that white space may exist in the provided string, so it won't add white space next to existing white space.

public string AddSpacesBeforeUpperCase(string nonSpacedString)
    {
        if (string.IsNullOrEmpty(nonSpacedString))
            return string.Empty;

        StringBuilder newText = new StringBuilder(nonSpacedString.Length * 2);
        newText.Append(nonSpacedString[0]);

        for (int i = 1; i < nonSpacedString.Length; i++)
        {
            char currentChar = nonSpacedString[i];

            // If it is whitespace, we do not need to add another next to it
            if(char.IsWhiteSpace(currentChar))
            {
                continue;
            }

            char previousChar = nonSpacedString[i - 1];
            char nextChar = i < nonSpacedString.Length - 1 ? nonSpacedString[i + 1] : nonSpacedString[i];

            if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar) 
                && !(char.IsUpper(previousChar) && char.IsUpper(nextChar)))
            {
                newText.Append(' ');
            }
            else if (i < nonSpacedString.Length)
            {
                if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar) && !char.IsUpper(nextChar))
                {
                    newText.Append(' ');
                }
            }

            newText.Append(currentChar);
        }

        return newText.ToString();
    }
Yetiish
  • 703
  • 1
  • 8
  • 19
0

For anyone who is looking for a C++ function answering this same question, you can use the following. This is modeled after the answer given by @Binary Worrier. This method just preserves Acronyms automatically.

using namespace std;

void AddSpacesToSentence(string& testString)
        stringstream ss;
        ss << testString.at(0);
        for (auto it = testString.begin() + 1; it != testString.end(); ++it )
        {
            int index = it - testString.begin();
            char c = (*it);
            if (isupper(c))
            {
                char prev = testString.at(index - 1);
                if (isupper(prev))
                {
                    if (index < testString.length() - 1)
                    {
                        char next = testString.at(index + 1);
                        if (!isupper(next) && next != ' ')
                        {
                            ss << ' ';
                        }
                    }
                }
                else if (islower(prev)) 
                {
                   ss << ' ';
                }
            }

            ss << c;
        }

        cout << ss.str() << endl;

The tests strings I used for this function, and the results are:

  • "helloWorld" -> "hello World"
  • "HelloWorld" -> "Hello World"
  • "HelloABCWorld" -> "Hello ABC World"
  • "HelloWorldABC" -> "Hello World ABC"
  • "ABCHelloWorld" -> "ABC Hello World"
  • "ABC HELLO WORLD" -> "ABC HELLO WORLD"
  • "ABCHELLOWORLD" -> "ABCHELLOWORLD"
  • "A" -> "A"
lbrendanl
  • 2,626
  • 4
  • 33
  • 54
0

A C# solution for an input string that consists only of ASCII characters. The regex incorporates negative lookbehind to ignore a capital (upper case) letter that appears at the beginning of the string. Uses Regex.Replace() to return the desired string.

Also see regex101.com demo.

using System;
using System.Text.RegularExpressions;

public class RegexExample
{
    public static void Main()
    {
        var text = "ThisStringHasNoSpacesButItDoesHaveCapitals";

        // Use negative lookbehind to match all capital letters
        // that do not appear at the beginning of the string.
        var pattern = "(?<!^)([A-Z])";

        var rgx = new Regex(pattern);
        var result = rgx.Replace(text, " $1");
        Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
    }
}

Expected Output:

Input: [ThisStringHasNoSpacesButItDoesHaveCapitals]
Output: [This String Has No Spaces But It Does Have Capitals]

Update: Here's a variation that will also handle acronyms (sequences of upper-case letters).

Also see regex101.com demo and ideone.com demo.

using System;
using System.Text.RegularExpressions;

public class RegexExample
{
    public static void Main()
    {
        var text = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";

        // Use positive lookbehind to locate all upper-case letters
        // that are preceded by a lower-case letter.
        var patternPart1 = "(?<=[a-z])([A-Z])";

        // Used positive lookbehind and lookahead to locate all
        // upper-case letters that are preceded by an upper-case
        // letter and followed by a lower-case letter.
        var patternPart2 = "(?<=[A-Z])([A-Z])(?=[a-z])";

        var pattern = patternPart1 + "|" + patternPart2;
        var rgx = new Regex(pattern);
        var result = rgx.Replace(text, " $1$2");

        Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
    }
}

Expected Output:

Input: [ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ]
Output: [This String Has No Spaces ASCII But It Does Have Capitals LINQ]
DavidRR
  • 18,291
  • 25
  • 109
  • 191
0

This one includes acronyms and acronym plurals and is a bit faster than the accepted answer:

public string Sentencify(string value)
{
    if (string.IsNullOrWhiteSpace(value))
        return string.Empty;

    string final = string.Empty;
    for (int i = 0; i < value.Length; i++)
    {
        if (i != 0 && Char.IsUpper(value[i]))
        {
            if (!Char.IsUpper(value[i - 1]))
                final += " ";
            else if (i < (value.Length - 1))
            {
                if (!Char.IsUpper(value[i + 1]) && !((value.Length >= i && value[i + 1] == 's') ||
                                                     (value.Length >= i + 1 && value[i + 1] == 'e' && value[i + 2] == 's')))
                    final += " ";
            }
        }

        final += value[i];
    }

    return final;
}

Passes these tests:

string test1 = "RegularOTs";
string test2 = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
string test3 = "ThisStringHasNoSpacesButItDoesHaveCapitals";
Serj Sagan
  • 28,927
  • 17
  • 154
  • 183
  • the accepted answer deals with the case where value is null – Chris F Carroll Dec 15 '14 at 09:59
  • This adds an extra space in front of the output, ie HireDate => " Hire Date". Needs a final.TrimStart or something. I think that's what one of the other answers is pointing out below but because of the reordering I'm not sure if he was talking to you since his answer is RegEx based. – b_levitt Feb 06 '15 at 22:52
  • Good catch...should have added a start and end marker to my tests...fixed now. – Serj Sagan Feb 09 '15 at 17:53
  • Similar to other solution posted here, it fails with string "RegularOTs". It returns "Regular O Ts" – Patee Gutee Jan 29 '18 at 19:33
  • Thanks for bringing up abbreviation plurals, I've updated to work for this as well. – Serj Sagan Jan 29 '18 at 20:50
0

Here is a more thorough solution that doesn't put spaces in front of words:

Note: I have used multiple Regexs (not concise but it will also handle acronyms and single letter words)

Dim s As String = "ThisStringHasNoSpacesButItDoesHaveCapitals"
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z](?=[A-Z])[a-z]*)", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([A-Z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2") // repeat a second time

In:

"ThisStringHasNoSpacesButItDoesHaveCapitals"
"IAmNotAGoat"
"LOLThatsHilarious!"
"ThisIsASMSMessage"

Out:

"This String Has No Spaces But It Does Have Capitals"
"I Am Not A Goat"
"LOL Thats Hilarious!"
"This Is ASMS Message" // (Difficult to handle single letter words when they are next to acronyms.)
CrazyTim
  • 6,695
  • 6
  • 34
  • 55
0

All the previous responses looked too over complicated.

I had string that had a mixture of capitals and _ so used, string.Replace() to make the _, " " and used the following to add a space at the capital letters.

for (int i = 0; i < result.Length; i++)
{
    if (char.IsUpper(result[i]))
    {
        counter++;
        if (i > 1) //stops from adding a space at if string starts with Capital
        {
            result = result.Insert(i, " ");
            i++; //Required** otherwise stuck in infinite 
                 //add space loop over a single capital letter.
        }
    }
}
johnny 5
  • 19,893
  • 50
  • 121
  • 195
st3_121
  • 32
  • 6
0

Inspired by Binary Worrier answer I took a swing at this.

Here's the result:

/// <summary>
/// String Extension Method
/// Adds white space to strings based on Upper Case Letters
/// </summary>
/// <example>
/// strIn => "HateJPMorgan"
/// preserveAcronyms false => "Hate JP Morgan"
/// preserveAcronyms true => "Hate JPMorgan"
/// </example>
/// <param name="strIn">to evaluate</param>
/// <param name="preserveAcronyms" >determines saving acronyms (Optional => false) </param>
public static string AddSpaces(this string strIn, bool preserveAcronyms = false)
{
    if (string.IsNullOrWhiteSpace(strIn))
        return String.Empty;

    var stringBuilder = new StringBuilder(strIn.Length * 2)
        .Append(strIn[0]);

    int i;

    for (i = 1; i < strIn.Length - 1; i++)
    {
        var c = strIn[i];

        if (Char.IsUpper(c) && (Char.IsLower(strIn[i - 1]) || (preserveAcronyms && Char.IsLower(strIn[i + 1]))))
            stringBuilder.Append(' ');

        stringBuilder.Append(c);
    }

    return stringBuilder.Append(strIn[i]).ToString();
}

Did test using stopwatch running 10000000 iterations and various string lengths and combinations.

On average 50% (maybe a bit more) faster than Binary Worrier answer.

João Sequeira
  • 157
  • 3
  • 12
0
    private string GetProperName(string Header)
    {
        if (Header.ToCharArray().Where(c => Char.IsUpper(c)).Count() == 1)
        {
            return Header;
        }
        else
        {
            string ReturnHeader = Header[0].ToString();
            for(int i=1; i<Header.Length;i++)
            {
                if (char.IsLower(Header[i-1]) && char.IsUpper(Header[i]))
                {
                    ReturnHeader += " " + Header[i].ToString();
                }
                else
                {
                    ReturnHeader += Header[i].ToString();
                }
            }

            return ReturnHeader;
        }

        return Header;
    }
0

An implementation with fold, also known as Aggregate:

    public static string SpaceCapitals(this string arg) =>
       new string(arg.Aggregate(new List<Char>(),
                      (accum, x) => 
                      {
                          if (Char.IsUpper(x) &&
                              accum.Any() &&
                              // prevent double spacing
                              accum.Last() != ' ' &&
                              // prevent spacing acronyms (ASCII, SCSI)
                              !Char.IsUpper(accum.Last()))
                          {
                              accum.Add(' ');
                          }

                          accum.Add(x);

                          return accum;
                      }).ToArray());

In addition to the request, this implementation correctly saves leading, inner, trailing spaces and acronyms, for example,

" SpacedWord " => " Spaced Word ",  

"Inner Space" => "Inner Space",  

"SomeACRONYM" => "Some ACRONYM".
Artur A
  • 7,115
  • 57
  • 60
0

A simple way to add spaces after lower case letters, upper case letters or digits.

    string AddSpacesToSentence(string value, bool spaceLowerChar = true, bool spaceDigitChar = true, bool spaceSymbolChar = false)
    {
        var result = "";

        for (int i = 0; i < value.Length; i++)
        {
            char currentChar = value[i];
            char nextChar = value[i < value.Length - 1 ? i + 1 : value.Length - 1];

            if (spaceLowerChar && char.IsLower(currentChar) && !char.IsLower(nextChar))
            {
                result += value[i] + " ";
            }
            else if (spaceDigitChar && char.IsDigit(currentChar) && !char.IsDigit(nextChar))
            {
                result += value[i] + " ";
            }
            else if(spaceSymbolChar && char.IsSymbol(currentChar) && !char.IsSymbol(nextChar))
            {
                result += value[i];
            }
            else
            {
                result += value[i];
            }
        }

        return result;
    }
Prince Owusu
  • 31
  • 1
  • 4
  • 1
    Code-only answers are discouraged. Please click on [edit] and add some words summarising how your code addresses the question, or perhaps explain how your answer differs from the previous answer/answers. [From Review](https://stackoverflow.com/review/late-answers/22687171) – Nick Apr 08 '19 at 01:37
0

I wanna to use this one

thanks to @Sean

string InsertSpace(string text) {
        return string.Join("", text.Select(ch => (char.IsUpper(ch) ? " " : "") + ch));
    }

in 2023

i want to use this one to keep uppercase words

class USpace {
    public static string Create(string text) {
        int l = 0; return string.Join("", text.Select(ch => { if (char.IsUpper(ch)) { string res = (l==0 ? "" : " ")  + ch; l = 0; return res; } l++; return ch.ToString(); }));
    }
}
hossein sedighian
  • 1,711
  • 1
  • 13
  • 16
0

In addition to Martin Brown's Answer, I had an issue with numbers as well. For Example: "Location2", or "Jan22" should be "Location 2", and "Jan 22" respectively.

Here is my Regular Expression for doing that, using Martin Brown's answer:

"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))|((?<=[\p{Ll}\p{Lu}])\p{Nd})|((?<=\p{Nd})\p{Lu})"

Here are a couple great sites for figuring out what each part means as well:

Java Based Regular Expression Analyzer (but works for most .net regex's)

Action Script Based Analyzer

The above regex won't work on the action script site unless you replace all of the \p{Ll} with [a-z], the \p{Lu} with [A-Z], and \p{Nd} with [0-9].

Daryl
  • 18,592
  • 9
  • 78
  • 145