171

I'd like to write a method that converts CamelCase into a human-readable name.

Here's the test case:

public void testSplitCamelCase() {
    assertEquals("lowercase", splitCamelCase("lowercase"));
    assertEquals("Class", splitCamelCase("Class"));
    assertEquals("My Class", splitCamelCase("MyClass"));
    assertEquals("HTML", splitCamelCase("HTML"));
    assertEquals("PDF Loader", splitCamelCase("PDFLoader"));
    assertEquals("A String", splitCamelCase("AString"));
    assertEquals("Simple XML Parser", splitCamelCase("SimpleXMLParser"));
    assertEquals("GL 11 Version", splitCamelCase("GL11Version"));
}
riddle_me_this
  • 8,575
  • 10
  • 55
  • 80
Frederik
  • 14,156
  • 10
  • 45
  • 53
  • 5
    First, you will need to specify the rules of the conversion. For instance, how does `PDFLoader` become `PDF Loader`? – Jørn Schou-Rode Apr 01 '10 at 10:47
  • 3
    I call that format "PascalCase". In "camelCase" the first letter should be lowercase. At least as far as developers are concerned. http://msdn.microsoft.com/en-us/library/x2dbyw72(v=vs.71).aspx – Muhd Nov 17 '11 at 19:55

12 Answers12

359

This works with your testcases:

static String splitCamelCase(String s) {
   return s.replaceAll(
      String.format("%s|%s|%s",
         "(?<=[A-Z])(?=[A-Z][a-z])",
         "(?<=[^A-Z])(?=[A-Z])",
         "(?<=[A-Za-z])(?=[^A-Za-z])"
      ),
      " "
   );
}

Here's a test harness:

    String[] tests = {
        "lowercase",        // [lowercase]
        "Class",            // [Class]
        "MyClass",          // [My Class]
        "HTML",             // [HTML]
        "PDFLoader",        // [PDF Loader]
        "AString",          // [A String]
        "SimpleXMLParser",  // [Simple XML Parser]
        "GL11Version",      // [GL 11 Version]
        "99Bottles",        // [99 Bottles]
        "May5",             // [May 5]
        "BFG9000",          // [BFG 9000]
    };
    for (String test : tests) {
        System.out.println("[" + splitCamelCase(test) + "]");
    }

It uses zero-length matching regex with lookbehind and lookforward to find where to insert spaces. Basically there are 3 patterns, and I use String.format to put them together to make it more readable.

The three patterns are:

UC behind me, UC followed by LC in front of me

  XMLParser   AString    PDFLoader
    /\        /\           /\

non-UC behind me, UC in front of me

 MyClass   99Bottles
  /\        /\

Letter behind me, non-letter in front of me

 GL11    May5    BFG9000
  /\       /\      /\

References

Related questions

Using zero-length matching lookarounds to split:

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 1
    The concept works in C# as well (with the same regular expressions, but a little different regular-expression framework, of course). Excellent work. Thanks! – gmm Jan 07 '13 at 20:08
  • Doesn't seem to be working for me on Python, it could be because the regex engine is not the same. I'll have to try doing something less elegant, I'm afraid. :) – MarioVilas Sep 03 '13 at 18:49
  • 2
    Could someone please explain what %s|%s|%s mean with respect to the testcases and also generally? – Ari53nN3o Nov 11 '14 at 23:11
  • 1
    @Ari53nN3o: The *"`%s`"* 's are placeholders for the [`String.format(String format, args...)`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#format%29java.lang.String,%20java.lang.Object...%29) arguments. You can also call by index: `String.format("%$1s|%$2s|%$3s", ...` – Mr. Polywhirl Jan 21 '15 at 03:05
  • How this will work in c#? There is no `relaceAll` also I want to add split if string has "`.`" in that. – sarojanand Mar 23 '15 at 18:21
  • I think it's best to put the output of `String.format` in a constant if this function will be called a lot. – herman Jul 15 '16 at 16:20
  • nice demonstration of what can be achieved with good use of regular expressions, even though, the algorithm described by @Ralph below (the one that uses commons-lang) is about 20 times faster and more succinct + doesn't re-invent the wheel. Suggest readers using that one unless your language is not Java. – Clint Eastwood Nov 15 '16 at 15:19
  • downvoted because it does not work for the "camelCase" – j.con Jun 21 '17 at 19:21
  • @polygenelubricants the test is failing with something like this "simpleXmlParser" it does not capitalize the first letter s – Emmanuel Njorodongo Jan 07 '22 at 12:59
152

You can do it using org.apache.commons.lang.StringUtils

StringUtils.join(
     StringUtils.splitByCharacterTypeCamelCase("ExampleTest"),
     ' '
);
JoeG
  • 7,191
  • 10
  • 60
  • 105
Ralph
  • 118,862
  • 56
  • 287
  • 383
  • 12
    This solution is much better than the most upvoted one because: a) It doesn't re-invent the wheel: commons-lang is a de-facto standard and it works fine, very focus on performance. b) When the conversion is done a lot of times this method is much faster than the regex-based one: this is my benchmark for executing the aforementioned tests 100,000 times: ``` regex-based method took 4820 milliseconds ////////// commons-lang-based method took 232 milliseconds ``` that's about 20 times faster than the one that uses regex!!!! – Clint Eastwood Nov 15 '16 at 15:13
  • 3
    I definitely agree with Clint on this one, this should be the accepted answer. Performance is a thing but using a battle-tested library is definitely a good programming practice. – Julien Nov 28 '16 at 20:42
  • 1
    Or by using Java 8's String.join() method: String.join(" ", StringUtils.splitByCharacterTypeCamelCase("ExampleTest")); – dk7 Jan 25 '18 at 22:36
28

The neat and shorter solution :

StringUtils.capitalize(StringUtils.join(StringUtils.splitByCharacterTypeCamelCase("yourCamelCaseText"), StringUtils.SPACE)); // Your Camel Case Text
Sahil Chhabra
  • 10,621
  • 4
  • 63
  • 62
12

If you don't like "complicated" regex's, and aren't at all bothered about efficiency, then I've used this example to achieve the same effect in three stages.

String name = 
    camelName.replaceAll("([A-Z][a-z]+)", " $1") // Words beginning with UC
             .replaceAll("([A-Z][A-Z]+)", " $1") // "Words" of only UC
             .replaceAll("([^A-Za-z ]+)", " $1") // "Words" of non-letters
             .trim();

It passes all the test cases above, including those with digits.

As I say, this isn't as good as using the one regular expression in some other examples here - but someone might well find it useful.

jlb83
  • 1,988
  • 1
  • 19
  • 30
6

You can use org.modeshape.common.text.Inflector.

Specifically:

String humanize(String lowerCaseAndUnderscoredWords,
    String... removableTokens) 

Capitalizes the first word and turns underscores into spaces and strips trailing "_id" and any supplied removable tokens.

Maven artifact is: org.modeshape:modeshape-common:2.3.0.Final

on JBoss repository: https://repository.jboss.org/nexus/content/repositories/releases

Here's the JAR file: https://repository.jboss.org/nexus/content/repositories/releases/org/modeshape/modeshape-common/2.3.0.Final/modeshape-common-2.3.0.Final.jar

Hendy Irawan
  • 20,498
  • 11
  • 103
  • 114
1

This works in .NET... optimize to your liking. I added comments so you can understand what each piece is doing. (RegEx can be hard to understand)

public static string SplitCamelCase(string str)
{
    str = Regex.Replace(str, @"([A-Z])([A-Z][a-z])", "$1 $2");  // Capital followed by capital AND a lowercase.
    str = Regex.Replace(str, @"([a-z])([A-Z])", "$1 $2"); // Lowercase followed by a capital.
    str = Regex.Replace(str, @"(\D)(\d)", "$1 $2"); //Letter followed by a number.
    str = Regex.Replace(str, @"(\d)(\D)", "$1 $2"); // Number followed by letter.
    return str;
}
Xinbi
  • 272
  • 1
  • 2
  • 10
1

The following Regex can be used to identify the capitals inside words:

"((?<=[a-z0-9])[A-Z]|(?<=[a-zA-Z])[0-9]]|(?<=[A-Z])[A-Z](?=[a-z]))"

It matches every capital letter, that is ether after a non-capital letter or digit or followed by a lower case letter and every digit after a letter.

How to insert a space before them is beyond my Java skills =)

Edited to include the digit case and the PDF Loader case.

Jens
  • 25,229
  • 9
  • 75
  • 117
  • @Yaneeve: I just saw the digits... this might make things more complicated. Probably another Regex to catch those would be the easy way. – Jens Apr 01 '10 at 10:50
  • @Jens: Will it match the `L` in `PDFLoader`? – Jørn Schou-Rode Apr 01 '10 at 10:52
  • how about (?<=[a-z0-9])[A-Z0-9] ? – Yaneeve Apr 01 '10 at 10:52
  • @Jørn: Good point! Need to think about that. =) .... ok, edited something in to catch those. – Jens Apr 01 '10 at 10:54
  • @Yaneeve: That will unfortunately match the second 1 in 11. – Jens Apr 01 '10 at 10:59
  • 3
    Now, I vastly admire your Regex skill, but I'd hate to have to maintain that. – Chris Knight Apr 01 '10 at 11:07
  • 1
    @Chris: Yep, thats true. Regex is more of a write-only language. =) Although this particular expression is not very hard to read, if you read `|` as "or". Well... maybe it is... I've seen worse =/ – Jens Apr 01 '10 at 11:18
  • @Jens, @Chris Knight: Regexes do not **have** to be hard to read or maintain. [Here](http://stackoverflow.com/questions/4044946/regex-to-split-html-tags/4045840#4045840), [here](http://stackoverflow.com/questions/4077896/perl-or-python-convert-date-from-dd-mm-yyyy-to-yyyy-mm-dd/4078817#4078817), and [here](http://stackoverflow.com/questions/4031112/regular-expression-matching) I show how to write maintainable patterns. *“No programming language can be maintainable that forbids white space, comments, subroutines, or alphanumeric identifiers. So use all those things in your patterns.”* – tchrist Nov 07 '10 at 12:51
1

I think you will have to iterate over the string and detect changes from lowercase to uppercase, uppercase to lowercase, alphabetic to numeric, numeric to alphabetic. On every change you detect insert a space with one exception though: on a change from upper- to lowercase you insert the space one character before.

Felix
  • 142
  • 4
0

I took the Regex from polygenelubricants and turned it into an extension method on objects:

    /// <summary>
    /// Turns a given object into a sentence by:
    /// Converting the given object into a <see cref="string"/>.
    /// Adding spaces before each capital letter except for the first letter of the string representation of the given object.
    /// Makes the entire string lower case except for the first word and any acronyms.
    /// </summary>
    /// <param name="original">The object to turn into a proper sentence.</param>
    /// <returns>A string representation of the original object that reads like a real sentence.</returns>
    public static string ToProperSentence(this object original)
    {
        Regex addSpacesAtCapitalLettersRegEx = new Regex(@"(?<=[A-Z])(?=[A-Z][a-z]) | (?<=[^A-Z])(?=[A-Z]) | (?<=[A-Za-z])(?=[^A-Za-z])", RegexOptions.IgnorePatternWhitespace);
        string[] words = addSpacesAtCapitalLettersRegEx.Split(original.ToString());
        if (words.Length > 1)
        {
            List<string> wordsList = new List<string> { words[0] };
            wordsList.AddRange(words.Skip(1).Select(word => word.Equals(word.ToUpper()) ? word : word.ToLower()));
            words = wordsList.ToArray();
        }
        return string.Join(" ", words);
    }

This turns everything into a readable sentence. It does a ToString on the object passed. Then it uses the Regex given by polygenelubricants to split the string. Then it ToLowers each word except for the first word and any acronyms. Thought it might be useful for someone out there.

vbullinger
  • 4,016
  • 3
  • 27
  • 32
0

For the record, here is an almost (*) compatible Scala version:

  object Str { def unapplySeq(s: String): Option[Seq[Char]] = Some(s) }

  def splitCamelCase(str: String) =
    String.valueOf(
      (str + "A" * 2) sliding (3) flatMap {
        case Str(a, b, c) =>
          (a.isUpper, b.isUpper, c.isUpper) match {
            case (true, false, _) => " " + a
            case (false, true, true) => a + " "
            case _ => String.valueOf(a)
          }
      } toArray
    ).trim

Once compiled it can be used directly from Java if the corresponding scala-library.jar is in the classpath.

(*) it fails for the input "GL11Version" for which it returns "G L11 Version".

gerferra
  • 1,519
  • 1
  • 14
  • 26
-2

I'm not a regex ninja, so I'd iterate over the string, keeping the indexes of the current position being checked & the previous position. If the current position is a capital letter, I'd insert a space after the previous position and increment each index.

Joel
  • 1,437
  • 2
  • 18
  • 28
-3

http://code.google.com/p/inflection-js/

You could chain the String.underscore().humanize() methods to take a CamelCase string and convert it into a human readable string.

BeesonBison
  • 1,053
  • 1
  • 17
  • 27