How to convert camel case to snake case with two capitals next to each other

Question

I am trying to convert camel case to snake case.

Like this:

"LiveKarma" -> "live_karma"
"youGO" -> "you_g_o"

I cannot seem to get the second example working like that. It always outputs as 'you_go' . How can I get it to output 'you_g_o'

My code:

(Regex.Replace(line, "(?<=[a-z0-9])[A-Z]", "_$0", RegexOptions.Compiled)).ToLowerInvariant()

Do you need to use a regex? `Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems` ~Jamie Zawinski — gh9, Jul 23 '20 at 13:38
Your regular expression looks for a lowercase letter or a number followed by an uppercase letter. That seems at odds with what you're asking for in your question title. — ProgrammingLlama, Jul 23 '20 at 13:40
Perhaps change `[a-z0-9]` to `[a-zA-Z0-9]` https://regex101.com/r/Otna7T/1 — The fourth bird, Jul 23 '20 at 13:42
I'm not even sure that your code treats `LiveKarma` correctly: is the `L` really replaced with `l` in your tests? — Rafalon, Jul 23 '20 at 13:42
(?<!^)[A-Z] might be a clearer regex. i.e., match an uppercase character if you aren't at the beginning of the string. — aquinas, Jul 23 '20 at 14:01
I ran into the same issue - I started trying to find out, what's the actual standard when having two subsequently upper case characters and I couldn't find any documentation. Looking at how Newton Soft does, they don't add separators between upper case characters. https://github.com/JamesNK/Newtonsoft.Json/blob/d0a328e8a46304d62d2174b8bba54721d02be3d3/Src/Newtonsoft.Json/Utilities/StringUtils.cs#L243 If this is the standard, then the mapping isn't bijective since ex. `FOoBAr -> foo_bar -> FooBar` and `FooBar ->foo_bar -> FooBar`. I changed to avoid subsequently upper case characters. — schwartz, Nov 11 '22 at 09:25

score 21 · Answer 1 · answered Jul 23 '20 at 13:55

Here is an extension method that transforms the text into a snake case:

using System.Text;

public static string ToSnakeCase(this string text)
{
    if(text == null) {
        throw new ArgumentNullException(nameof(text));
    }
    if(text.Length < 2) {
        return text;
    }
    var sb = new StringBuilder();
    sb.Append(char.ToLowerInvariant(text[0]));
    for(int i = 1; i < text.Length; ++i) {
        char c = text[i];
        if(char.IsUpper(c)) {
            sb.Append('_');
            sb.Append(char.ToLowerInvariant(c));
        } else {
            sb.Append(c);
        }
    }
    return sb.ToString();
}

Put it into a static class somewhere (named for example StringExtensions) and use it like this:

string text = "LiveKarma";
string snakeCaseText = text.ToSnakeCase();
// snakeCaseText => "live_karma"

GeekInside · Answer 2 · 2022-04-27T13:29:44.783

Since the option that converts abbreviations as separate words is not suitable for many, I found a complete solution in the EF Core codebase.

Here are a couple of examples of how the code works:

TestSC -> test_sc
testSC -> test_sc
TestSnakeCase -> test_snake_case
testSnakeCase -> test_snake_case
TestSnakeCase123 -> test_snake_case123
_testSnakeCase123 -> _test_snake_case123
test_SC -> test_sc

I rewrote it a bit so you can copy it as a ready-to-use string extension:

using System;
using System.Globalization;
using System.Text;

namespace Extensions
{
    public static class StringExtensions
    {
        public static string ToSnakeCase(this string text)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            var builder = new StringBuilder(text.Length + Math.Min(2, text.Length / 5));
            var previousCategory = default(UnicodeCategory?);

            for (var currentIndex = 0; currentIndex < text.Length; currentIndex++)
            {
                var currentChar = text[currentIndex];
                if (currentChar == '_')
                {
                    builder.Append('_');
                    previousCategory = null;
                    continue;
                }

                var currentCategory = char.GetUnicodeCategory(currentChar);
                switch (currentCategory)
                {
                    case UnicodeCategory.UppercaseLetter:
                    case UnicodeCategory.TitlecaseLetter:
                        if (previousCategory == UnicodeCategory.SpaceSeparator ||
                            previousCategory == UnicodeCategory.LowercaseLetter ||
                            previousCategory != UnicodeCategory.DecimalDigitNumber &&
                            previousCategory != null &&
                            currentIndex > 0 &&
                            currentIndex + 1 < text.Length &&
                            char.IsLower(text[currentIndex + 1]))
                        {
                            builder.Append('_');
                        }

                        currentChar = char.ToLower(currentChar, CultureInfo.InvariantCulture);
                        break;

                    case UnicodeCategory.LowercaseLetter:
                    case UnicodeCategory.DecimalDigitNumber:
                        if (previousCategory == UnicodeCategory.SpaceSeparator)
                        {
                            builder.Append('_');
                        }
                        break;

                    default:
                        if (previousCategory != null)
                        {
                            previousCategory = UnicodeCategory.SpaceSeparator;
                        }
                        continue;
                }

                builder.Append(currentChar);
                previousCategory = currentCategory;
            }

            return builder.ToString();
        }
    }
}

You can find the original code here: https://github.com/efcore/EFCore.NamingConventions/blob/main/EFCore.NamingConventions/Internal/SnakeCaseNameRewriter.cs

UPD 27.04.2022:

Also, you can use Newtonsoft library if you're looking for a ready to use third party solution. The output of the code is the same as the code above.

// using Newtonsoft.Json.Serialization;
var snakeCaseStrategy = new SnakeCaseNamingStrategy();
var snakeCaseResult = snakeCaseStrategy.GetPropertyName(text, false);

This also produces the same results: `Regex.Replace(Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2"), "([a-z0-9])([A-Z])", "$1_$2").ToLower()` — jwatts1980, Oct 06 '22 at 11:08
The question is about how to add a separator in-between two subsequent upper case characters and this answer doesn't provide a solution for this. — schwartz, Nov 11 '22 at 08:36

Bar Nuri · Answer 3 · 2021-09-11T16:50:20.703

7

using Newtonsoft package

    public static string? ToCamelCase(this string? str) => str is null
        ? null
        : new DefaultContractResolver() { NamingStrategy = new CamelCaseNamingStrategy() }.GetResolvedPropertyName(str);

    public static string? ToSnakeCase(this string? str) => str is null
        ? null
        : new DefaultContractResolver() { NamingStrategy = new SnakeCaseNamingStrategy() }.GetResolvedPropertyName(str);

edited Sep 11 '21 at 16:50

answered Aug 25 '21 at 06:45

Bar Nuri

762
1
9
15

1

Maybe mention that you need to import the Newtonsoft JSON package for this solution – Benjineer Sep 10 '21 at 12:19
In a situation where we already use the package (NewtonSoft.Json), I think this can be used though. – Firanto Oct 19 '21 at 13:42

score 6 · Answer 4 · edited Mar 31 '22 at 23:31

6

Simple Linq based solution... no idea if its faster or not. basically ignores consecutive uppercases

public static string ToUnderscoreCase(this string str)
    => string.Concat((str ?? string.Empty).Select((x, i) => i > 0 && i < str.Length - 1 && char.IsUpper(x) && !char.IsUpper(str[i-1]) ? $"_{x}" : x.ToString())).ToLower();

edited Mar 31 '22 at 23:31

s0n1c

78
9

answered May 22 '21 at 17:53

danatcofo

703
8
18

I just tried to use this one, and it throws exceptions if the string ends with a capital letter because it's attempting to look at the next character in the string, but it's outside of the bounds of the array. – s0n1c Mar 31 '22 at 17:36

Rufus L · Answer 5 · 2020-07-23T15:24:35.623

RegEx Solution

A quick internet search turned up this site which has an answer using RegEx, which I had to modify to grab the Value portion in order for it to work on my machine (but it has the RegEx you're looking for). I also modified it to handle null input, rather than throwing an exception:

public static string ToSnakeCase2(string str)
{
    var pattern = 
        new Regex(@"[A-Z]{2,}(?=[A-Z][a-z]+[0-9]*|\b)|[A-Z]?[a-z]+[0-9]*|[A-Z]|[0-9]+");

    return str == null
        ? null
        : string
            .Join("_", pattern.Matches(str).Cast<Match>().Select(m => m.Value))
            .ToLower();
}

Non-RegEx Solution

For a non-regex solution, we can do the following:

Reduce all whitespace to a single space by
- using string.Split to split with an empty array as the first parameter to split on all whitespace
- joining those parts back together with the '_' character
Prefix all upper-case characters with '_' and lower-case them
Split and re-join the resulting string on the _ character to remove any instances of multiple concurrent underscores ("__") and to remove any leading or trailing instances of the character.

For example:

public static string ToSnakeCase(string str)
{
    return str == null
        ? null
        : string.Join("_", string.Concat(string.Join("_", str.Split(new char[] {},
            StringSplitOptions.RemoveEmptyEntries))
            .Select(c => char.IsUpper(c)
                ? $"_{c}".ToLower()
                : $"{c}"))
            .Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries));
}

I haven't tested your's, but this one works on all the cases I threw at it: `Regex.Replace(Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2"), "([a-z0-9])([A-Z])", "$1_$2").ToLower()` — jwatts1980, Oct 06 '22 at 11:10

score 2 · Answer 6 · answered Nov 21 '21 at 15:22

There is a well maintained EF Core community project that implements a number of naming convention rewriters called EFCore.NamingConventions. The rewriters don't have any internal dependencies, so if you don't want to bring in an EF Core related package you can just copy the rewriter code out.

Here is the snake case rewriter: https://github.com/efcore/EFCore.NamingConventions/blob/main/EFCore.NamingConventions/Internal/SnakeCaseNameRewriter.cs

gh9 · Answer 7 · 2020-07-23T14:03:19.660

1

pseudo code below. In essence check if each char is upper case, then if it is add a _, then add the char to lower case

var newString = s.subString(0,1).ToLower();
foreach (char c in s.SubString(1,s.length-1))
{
    if (char.IsUpper(c))
    {
        newString = newString + "_";
    }
    newString = newString + c.ToLower();
}

edited Jul 23 '20 at 14:03

answered Jul 23 '20 at 13:44

gh9

10,169
10
63
96

6

You'd have to treat first character differently, as you do not want `_live_karma` with input `LiveKarma`. Also, when *building* a string like this, you might prefer to use a `StringBuilder` – Rafalon Jul 23 '20 at 13:44

score 1 · Answer 8 · answered Sep 23 '21 at 09:09

if you're into micro-optimaizations and want to prevent unneccessary conversions wherever possible, this one might also work:

    public static string ToSnakeCase(this string text)
    {
        static IEnumerable<char> Convert(CharEnumerator e)
        {
            if (!e.MoveNext()) yield break;
            yield return char.ToLower(e.Current);
            while (e.MoveNext())
            {
                if (char.IsUpper(e.Current))
                {
                    yield return '_';
                    yield return char.ToLower(e.Current);
                }
                else
                {
                    yield return e.Current;
                }
            }
        }

        return new string(Convert(text.GetEnumerator()).ToArray());
    }

score 1 · Answer 9 · answered Oct 06 '22 at 11:15

May has well toss this one out. Very simple and worked for me.

public static string ToSnakeCase(this string text)
{
    text = Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2");
    text = Regex.Replace(text, "([a-z0-9])([A-Z])", "$1_$2");
    return text.ToLower();
}

Testing it with some samples (borrowed from @GeekInside's answer):

var samples = new List<string>() { "TestSC", "testSC", "TestSnakeCase", "testSnakeCase", "TestSnakeCase123", "_testSnakeCase123", "test_SC" };
var results = new List<string>() { "test_sc", "test_sc", "test_snake_case", "test_snake_case", "test_snake_case123", "_test_snake_case123", "test_sc" };
for (int i = 0; i < samples.Count; i++)
{
    Console.WriteLine("Test success: " + (val.ToSnakeCase() == results[i] ? "true" : "false"));
}

Produced the following output:

Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true

How to convert camel case to snake case with two capitals next to each other

9 Answers9

Linked