Regex for converting CamelCase to camel_case in java

Question

I understand why the desired output is not given for converting using regex a string like FooBar to Foo_Bar which instead gives Foo_Bar_. I could have done something with String.substring substring(0, string.length() - 2) or just replaced the last character, but I think there is a better solution to such a scenario.

Here is the code:

String regex = "([A-Z][a-z]+)";
String replacement = "$1_";

"CamelCaseToSomethingElse".replaceAll(regex, replacement); 

/*
outputs: Camel_Case_To_Something_Else_
desired output: Camel_Case_To_Something_Else
*/

Question: Looking for a neater way to get the desired output?

This question is similar to http://stackoverflow.com/questions/4886091/insert-space-after-capital-letter — Paul Vargas, Apr 25 '12 at 06:21

score 202 · Accepted Answer · edited Apr 11 '19 at 22:22

202

See this question and CaseFormat from guava

in your case, something like:

CaseFormat.UPPER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, "SomeInput");

edited Apr 11 '19 at 22:22

mkobit

43,979
12
156
150

answered Apr 25 '12 at 06:23

@eliocs the question was not tagged android and "neater way".. Thanks for the downvote anyway ;) – Nov 06 '14 at 19:17
3

CaseFormat link is offline. Replacement is [here](https://github.com/google/guava/blob/master/guava/src/com/google/common/base/CaseFormat.java) – Anticom May 22 '16 at 13:26
There is a special case in the Guava function which may be is not wanted by every user. If we have "Ph_D" then you are going to get "ph__d" with two underscores. I am writing here to people notice – Govan Feb 03 '21 at 23:29
1

It doesn't seem like adding a dependency should be the solution. You're not teaching a man to fish, just giving him one. – ekydfejj Nov 09 '22 at 01:12

score 74 · Answer 2 · edited Jan 21 '16 at 12:44

74

bind the lower case and upper case as two group,it will be ok

public  class Main
{
    public static void main(String args[])
    {
        String regex = "([a-z])([A-Z]+)";
        String replacement = "$1_$2";
        System.out.println("CamelCaseToSomethingElse"
                           .replaceAll(regex, replacement)
                           .toLowerCase());
    }
}

edited Jan 21 '16 at 12:44

Rafael Winterhalter

42,759
13
108
192

answered Apr 25 '12 at 06:44

clevertension

6,929
3
28
33

2

Note: If single letter words are permitted in the input String, e.g. "thisIsATest", the above code will print "this_is_atest". Guava, in the accepted answer, results in "this_is_a_test". – DtotheK Nov 01 '18 at 09:55
This one will not work on a name start with caps, eg: `IBMIsMyCompany`. – User3301 Jun 16 '20 at 04:42

score 44 · Answer 3 · edited Feb 27 '15 at 19:54

44

You can use below code snippet:

String replaceAll = key.replaceAll("(.)(\\p{Upper})", "$1_$2").toLowerCase();

edited Feb 27 '15 at 19:54

MuffinMan

889
1
8
28

answered Aug 12 '13 at 18:54

Sandeep Vaid

1,409
11
7

What if my string contains a number - mode3 ends up as mode3, whereas I would want mode_3. – Mike Stoddart Sep 13 '17 at 15:25
It doesn't convert camel case like `MyUUID` to underscore properly, I got `my_uu_id`. – User3301 Jun 16 '20 at 04:09
1

@Mike/@User3301 Try this `String replaceAll = key.replaceAll("(.)(\\p{Upper}+|\\d+)", "$1_$2").toLowerCase();` – Chetan Narsude Sep 03 '21 at 01:52

score 7 · Answer 4 · answered Jun 13 '18 at 12:56

I can't provide RegEx, it would be insanely complex anyway.

Try this function with automatic recognition of acronyms.

Unfortunately Guava lib doesn't auto detect upper case acronyms, so "bigCAT" would be converted to "BIG_C_A_T"

/**
 * Convert to UPPER_UNDERSCORE format detecting upper case acronyms
 */
private String upperUnderscoreWithAcronyms(String name) {
    StringBuffer result = new StringBuffer();
    boolean begin = true;
    boolean lastUppercase = false;
    for( int i=0; i < name.length(); i++ ) {
        char ch = name.charAt(i);
        if( Character.isUpperCase(ch) ) {
            // is start?
            if( begin ) {
                result.append(ch);
            } else {
                if( lastUppercase ) {
                    // test if end of acronym
                    if( i+1<name.length() ) {
                        char next = name.charAt(i+1);
                        if( Character.isUpperCase(next) ) {
                            // acronym continues
                            result.append(ch);
                        } else {
                            // end of acronym
                            result.append('_').append(ch);
                        }
                    } else {
                        // acronym continues
                        result.append(ch);
                    }
                } else {
                    // last was lowercase, insert _
                    result.append('_').append(ch);
                }
            }
            lastUppercase=true;
        } else {
            result.append(Character.toUpperCase(ch));
            lastUppercase=false;
        }
        begin=false;
    }
    return result.toString();
}

Brett Ryan · Answer 5 · 2014-10-14T16:04:39.227

5

Why not simply match prior character as a not start of line $?

String text = "CamelCaseToSomethingElse";
System.out.println(text.replaceAll("([^_A-Z])([A-Z])", "$1_$2"));

Note that this version is safe to be performed on something that is already camel cased.

edited Oct 14 '14 at 16:04

answered Oct 14 '14 at 10:36

Brett Ryan

26,937
30
128
163

Are you trying to use `^` and `$` as anchors? Because their meanings change when you put them in a character class. `[^$_A-Z]` matches any character that's not `$`, `_`, or an uppercase letter, and I don't think that's what you meant. – Alan Moore Oct 14 '14 at 12:37
Not intending as anchors, Am attempting to not match upper character, the `$` was mistakenly added as it's a technique I use on class-names. – Brett Ryan Oct 14 '14 at 16:04

score 4 · Answer 6 · answered Aug 23 '19 at 19:30

Not sure it's possible to have something really solide with pure regex. Especially to support acronyms.

I have made a small function, inspired by @radzimir answer, that supports acronyms and no alphabetic character:

From https://gist.github.com/ebuildy/cf46a09b1ac43eea17c7621b7617ebcd:

private static String snakeCaseFormat(String name) {
    final StringBuilder result = new StringBuilder();

    boolean lastUppercase = false;

    for (int i = 0; i < name.length(); i++) {
        char ch = name.charAt(i);
        char lastEntry = i == 0 ? 'X' : result.charAt(result.length() - 1);
        if (ch == ' ' || ch == '_' || ch == '-' || ch == '.') {
            lastUppercase = false;

            if (lastEntry == '_') {
                continue;
            } else {
                ch = '_';
            }
        } else if (Character.isUpperCase(ch)) {
            ch = Character.toLowerCase(ch);
            // is start?
            if (i > 0) {
                if (lastUppercase) {
                    // test if end of acronym
                    if (i + 1 < name.length()) {
                        char next = name.charAt(i + 1);
                        if (!Character.isUpperCase(next) && Character.isAlphabetic(next)) {
                            // end of acronym
                            if (lastEntry != '_') {
                                result.append('_');
                            }
                        }
                    }
                } else {
                    // last was lowercase, insert _
                    if (lastEntry != '_') {
                        result.append('_');
                    }
                }
            }
            lastUppercase = true;
        } else {
            lastUppercase = false;
        }

        result.append(ch);
    }
    return result.toString();
}

This is a quality answer, it handles most of edge cases. – User3301 Jun 16 '20 at 05:04 — User3301, Jun 16 '20 at 05:04

Has QUIT--Anony-Mousse · Answer 7 · 2012-04-26T03:32:01.633

Add a zero-width lookahead assertion.

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Read the documentation for (?=X) etc.

Personally, I would actually split the string, then recombine it. This may even be faster when done right, and it makes the code much easier to understand than regular expression magic. Don't get me wrong: I love regular expressions. But this isn't really a neat regular expression, nor is this transformation a classic regexp task. After all it seems you also want to do lowercase?

An ugly but quick hack would be to replace (.)([A-Z]+) with $1_$2 and then lowercase the whole string afterwards (unless you can do perl-style extrended regexps, where you can lowercase the replacement directly!). Still I consider splitting at lower-to-upper transition, then transforming, then joining as the proper and most readable way of doing this.

So I would split it into chunks matching `[A-Z][a-z]*`, lowercase the first letter, and rejoin them. Or the replacement+lowercase trick I just added to the main reply. — Has QUIT--Anony-Mousse, Apr 26 '12 at 03:29

score 2 · Answer 8 · edited Aug 21 '16 at 17:45

2

public class ReplaceFromCameltoSnake {
    public static void main(String args[]){
        String s1=" totalAmountWithoutDiscount";  
        String replaceString=s1.replaceAll("([A-Z]+)","\\_$1").toLowerCase(); 
        System.out.println(replaceString);  
    }
}

edited Aug 21 '16 at 17:45

Alan Moore

73,866
12
100
156

answered Aug 21 '16 at 17:07

abinash sahu

21
1

$1-is used to make group – abinash sahu Aug 21 '16 at 17:09

score 1 · Answer 9 · answered Apr 25 '12 at 06:29

([A-Z][a-z\d]+)(?=([A-Z][a-z\d]+))

Should search for a capital letter followed by lowercase letters. The positive lookahead will look for another word starting with a capital letter followed by lowercase letters but will NOT include it in the match.

Look here: http://regexr.com?30ooo

score 1 · Answer 10 · answered Feb 03 '21 at 23:38

I am writing this answer if somebody doesn't want to use Guava as below for any reason.

CaseFormat.UPPER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, "SomeInput");

In our case we had problem with storage. There is another special case with Guava: if we have "Ph_D" as input, then we are going to get "ph__d" with two underscores.

The code below worked as long as I tested it.

public static String camelCaseToLowerCaseWithUnderscore(String string) {
    if (string.matches(".*[a-z].*")) {
        final Matcher matcher = Pattern.compile("(_?[A-Z][a-z]?)").matcher(string);

        StringBuffer stringBuffer = new StringBuffer();
        matcher.find(); // This is just to escape the first group (beginning of string)
        while (matcher.find()) {
            final String group = matcher.group();
            if (!group.startsWith("_")) {
                matcher.appendReplacement(stringBuffer, "_" + group);
            }
        }
        matcher.appendTail(stringBuffer);
        return stringBuffer.toString().toLowerCase();
    }
    else {
        return string;
    }
}

Ali · Answer 11 · 2022-08-21T21:49:12.497

Here my solution with 3 regular expression:

str.replaceAll("([^A-Z])([A-Z0-9])", "$1_$2") // standard replace
                   .replaceAll("([A-Z]+)([A-Z0-9][^A-Z]+)", "$1_$2") // last letter after full uppercase.
                    .replaceAll("([0-9]+)([a-zA-Z]+)", "$1_$2").toLowerCase(); // letters after numbers

The result:

thisIsATest: this_is_a_test
EndWithNumber3: end_with_number_3
3ThisStartWithNumber: 3_this_start_with_number
Number3InMiddle: number_3_in_middle
Number3inMiddleAgain: number_3_in_middle_again
MyUUIDNot: my_uuid_not
HOLAMundo: hola_mundo
holaMUNDO: hola_mundo
with_underscore: with_underscore
withAUniqueLetter: with_a_unique_letter

Edited:

To support numbers and another symbols, you can use this:

str.replaceAll("([^A-Z])([A-Z])", "$1_$2") // standard replace
                    .replaceAll("([A-Z]+)([^a-z][^A-Z]+)", "$1_$2") // last letter after full uppercase.
                    .toLowerCase()
                    .replaceAll("([^a-z]+)([a-z]+)", "$1_$2") // letters after non-letters.
                    .replaceAll("([a-z]+)([^a-z]+)", "$1_$2"); // letters before non-letters.

The result:

thisIsATest: "this_is_a_test"
EndWithNumber3: "end_with_number_3"
3ThisStartWithNumber: "3_this_start_with_number"
Number3InMiddle: "number_3_in_middle"
Number3inMiddleAgain: "number_3_in_middle_again"
MyUUIDNot: "my_uuid_not"
HOLAMundo: "hola_mundo"
holaMUNDO: "hola_mundo"
with_underscore: "with_underscore"
withAUniqueLetter: "with_a_unique_letter"
with%SYMBOLAndNumber90: "with_%_symbol_and_number_90"
http%: "http_%"
123456789: "123456789"
     : "     "
_: "_"
__abc__: "__abc__"

Fails with numbers and nulls – Sergey Nemchinov Aug 21 '22 at 13:53 — Sergey Nemchinov, Aug 21 '22 at 13:53

Sergey Nemchinov · Answer 12 · 2022-08-23T13:58:32.603

Yet another solution with Apache Commons.

import org.apache.commons.lang3.StringUtils;

public static String toLowerUnderscore(String str) {
    if (str == null) {
        return null;
    }
    String[] tokens = StringUtils.splitByCharacterTypeCamelCase(str);
    String joined = StringUtils.join(tokens, '\t');
    String replaced =  joined
            .replace("_\t", "_") // save beginning underscore
            .replace("\t_", "_") // save ending underscore
            .replace("\t", "_"); // replace other underscores
    return replaced.toLowerCase();
}

Test cases (thanks @Ali):

thisIsATest:          this_is_a_test
EndWithNumber3:       end_with_number_3
3ThisStartWithNumber: 3_this_start_with_number
Number3InMiddle:      number_3_in_middle
Number3inMiddleAgain: number_3_in_middle_again
MyUUIDNot:            my_uuid_not
HOLAMundo:            hola_mundo
holaMUNDO:            hola_mundo
with_underscore:      with_underscore
withAUniqueLetter:    with_a_unique_letter
123456789:            123456789
"   ":                "   "
_:                    _
__abc__:              __abc__
null:                 null

score 0 · Answer 13 · answered Sep 06 '18 at 02:38

I've had to implement this to convert some keys in camel case format to lower case with underscores. The regular expression I came up with is:

(?<!^|_|[A-Z])([A-Z])

In english it stands for capital letter which is not preceded by the start of the string, an underscore or another capital letter.

In the samples below, the character in bold are the ones that should produce a match using the aforementioned regular expression:

CamelCaseToSomethingElse
camelCaseToSomethingElse
camel_case_to_something_else
Camel_Case_To_Something_Else
CAMEL_CASE_TO_SOMETHING_ELSE

Notice the expression does not affect string that are already in lower case + underscore format.

The replacement pattern would be:

_l$1

Which means lower case of first capturing group, first capturing group being the capital letter. You could lower case the whole string afterwards as well to normalize the last two samples from the list above.

score -1 · Answer 14 · answered Jan 21 '22 at 12:12

You can easily convert String to camel case using Stream API from Java 8 and method StringUtils.capitalize(..) from commons-lang

 public String toCamelCase(String str) {
    return Arrays.stream(str.split("_"))
        .map(StringUtils::capitalize)
        .collect(Collectors.joining());
}

Regex for converting CamelCase to camel_case in java

14 Answers14

Linked

Related