17

I am trying to convert camel case to snake case.

Like this:

"LiveKarma" -> "live_karma"
"youGO" -> "you_g_o"

I cannot seem to get the second example working like that. It always outputs as 'you_go' . How can I get it to output 'you_g_o'

My code:

(Regex.Replace(line, "(?<=[a-z0-9])[A-Z]", "_$0", RegexOptions.Compiled)).ToLowerInvariant()
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Live Karma
  • 171
  • 1
  • 1
  • 4
  • 4
    Do you need to use a regex? `Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems` ~Jamie Zawinski – gh9 Jul 23 '20 at 13:38
  • No I don't need regex – Live Karma Jul 23 '20 at 13:39
  • Your regular expression looks for a lowercase letter or a number followed by an uppercase letter. That seems at odds with what you're asking for in your question title. – ProgrammingLlama Jul 23 '20 at 13:40
  • Perhaps change `[a-z0-9]` to `[a-zA-Z0-9]` https://regex101.com/r/Otna7T/1 – The fourth bird Jul 23 '20 at 13:42
  • I'm not even sure that your code treats `LiveKarma` correctly: is the `L` really replaced with `l` in your tests? – Rafalon Jul 23 '20 at 13:42
  • Sorry I forgot to say I also do ToLowerInvariant() – Live Karma Jul 23 '20 at 13:49
  • Oh, so your code is only for adding underscores! – Rafalon Jul 23 '20 at 13:50
  • (?<!^)[A-Z] might be a clearer regex. i.e., match an uppercase character if you aren't at the beginning of the string. – aquinas Jul 23 '20 at 14:01
  • I ran into the same issue - I started trying to find out, what's the actual standard when having two subsequently upper case characters and I couldn't find any documentation. Looking at how Newton Soft does, they don't add separators between upper case characters. https://github.com/JamesNK/Newtonsoft.Json/blob/d0a328e8a46304d62d2174b8bba54721d02be3d3/Src/Newtonsoft.Json/Utilities/StringUtils.cs#L243 If this is the standard, then the mapping isn't bijective since ex. `FOoBAr -> foo_bar -> FooBar` and `FooBar ->foo_bar -> FooBar`. I changed to avoid subsequently upper case characters. – schwartz Nov 11 '22 at 09:25

9 Answers9

21

Here is an extension method that transforms the text into a snake case:

using System.Text;

public static string ToSnakeCase(this string text)
{
    if(text == null) {
        throw new ArgumentNullException(nameof(text));
    }
    if(text.Length < 2) {
        return text;
    }
    var sb = new StringBuilder();
    sb.Append(char.ToLowerInvariant(text[0]));
    for(int i = 1; i < text.Length; ++i) {
        char c = text[i];
        if(char.IsUpper(c)) {
            sb.Append('_');
            sb.Append(char.ToLowerInvariant(c));
        } else {
            sb.Append(c);
        }
    }
    return sb.ToString();
}

Put it into a static class somewhere (named for example StringExtensions) and use it like this:

string text = "LiveKarma";
string snakeCaseText = text.ToSnakeCase();
// snakeCaseText => "live_karma"
GregorMohorko
  • 2,739
  • 2
  • 22
  • 33
12

Since the option that converts abbreviations as separate words is not suitable for many, I found a complete solution in the EF Core codebase.

Here are a couple of examples of how the code works:

TestSC -> test_sc
testSC -> test_sc
TestSnakeCase -> test_snake_case
testSnakeCase -> test_snake_case
TestSnakeCase123 -> test_snake_case123
_testSnakeCase123 -> _test_snake_case123
test_SC -> test_sc

I rewrote it a bit so you can copy it as a ready-to-use string extension:

using System;
using System.Globalization;
using System.Text;

namespace Extensions
{
    public static class StringExtensions
    {
        public static string ToSnakeCase(this string text)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            var builder = new StringBuilder(text.Length + Math.Min(2, text.Length / 5));
            var previousCategory = default(UnicodeCategory?);

            for (var currentIndex = 0; currentIndex < text.Length; currentIndex++)
            {
                var currentChar = text[currentIndex];
                if (currentChar == '_')
                {
                    builder.Append('_');
                    previousCategory = null;
                    continue;
                }

                var currentCategory = char.GetUnicodeCategory(currentChar);
                switch (currentCategory)
                {
                    case UnicodeCategory.UppercaseLetter:
                    case UnicodeCategory.TitlecaseLetter:
                        if (previousCategory == UnicodeCategory.SpaceSeparator ||
                            previousCategory == UnicodeCategory.LowercaseLetter ||
                            previousCategory != UnicodeCategory.DecimalDigitNumber &&
                            previousCategory != null &&
                            currentIndex > 0 &&
                            currentIndex + 1 < text.Length &&
                            char.IsLower(text[currentIndex + 1]))
                        {
                            builder.Append('_');
                        }

                        currentChar = char.ToLower(currentChar, CultureInfo.InvariantCulture);
                        break;

                    case UnicodeCategory.LowercaseLetter:
                    case UnicodeCategory.DecimalDigitNumber:
                        if (previousCategory == UnicodeCategory.SpaceSeparator)
                        {
                            builder.Append('_');
                        }
                        break;

                    default:
                        if (previousCategory != null)
                        {
                            previousCategory = UnicodeCategory.SpaceSeparator;
                        }
                        continue;
                }

                builder.Append(currentChar);
                previousCategory = currentCategory;
            }

            return builder.ToString();
        }
    }
}

You can find the original code here: https://github.com/efcore/EFCore.NamingConventions/blob/main/EFCore.NamingConventions/Internal/SnakeCaseNameRewriter.cs

UPD 27.04.2022:

Also, you can use Newtonsoft library if you're looking for a ready to use third party solution. The output of the code is the same as the code above.

// using Newtonsoft.Json.Serialization;
var snakeCaseStrategy = new SnakeCaseNamingStrategy();
var snakeCaseResult = snakeCaseStrategy.GetPropertyName(text, false);
GeekInside
  • 121
  • 1
  • 3
  • This also produces the same results: `Regex.Replace(Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2"), "([a-z0-9])([A-Z])", "$1_$2").ToLower()` – jwatts1980 Oct 06 '22 at 11:08
  • The question is about how to add a separator in-between two subsequent upper case characters and this answer doesn't provide a solution for this. – schwartz Nov 11 '22 at 08:36
7

using Newtonsoft package

    public static string? ToCamelCase(this string? str) => str is null
        ? null
        : new DefaultContractResolver() { NamingStrategy = new CamelCaseNamingStrategy() }.GetResolvedPropertyName(str);

    public static string? ToSnakeCase(this string? str) => str is null
        ? null
        : new DefaultContractResolver() { NamingStrategy = new SnakeCaseNamingStrategy() }.GetResolvedPropertyName(str);
Bar Nuri
  • 762
  • 1
  • 9
  • 15
6

Simple Linq based solution... no idea if its faster or not. basically ignores consecutive uppercases

public static string ToUnderscoreCase(this string str)
    => string.Concat((str ?? string.Empty).Select((x, i) => i > 0 && i < str.Length - 1 && char.IsUpper(x) && !char.IsUpper(str[i-1]) ? $"_{x}" : x.ToString())).ToLower();
s0n1c
  • 78
  • 9
danatcofo
  • 703
  • 8
  • 18
  • I just tried to use this one, and it throws exceptions if the string ends with a capital letter because it's attempting to look at the next character in the string, but it's outside of the bounds of the array. – s0n1c Mar 31 '22 at 17:36
3

RegEx Solution

A quick internet search turned up this site which has an answer using RegEx, which I had to modify to grab the Value portion in order for it to work on my machine (but it has the RegEx you're looking for). I also modified it to handle null input, rather than throwing an exception:

public static string ToSnakeCase2(string str)
{
    var pattern = 
        new Regex(@"[A-Z]{2,}(?=[A-Z][a-z]+[0-9]*|\b)|[A-Z]?[a-z]+[0-9]*|[A-Z]|[0-9]+");

    return str == null
        ? null
        : string
            .Join("_", pattern.Matches(str).Cast<Match>().Select(m => m.Value))
            .ToLower();
}

Non-RegEx Solution

For a non-regex solution, we can do the following:

  1. Reduce all whitespace to a single space by
    • using string.Split to split with an empty array as the first parameter to split on all whitespace
    • joining those parts back together with the '_' character
  2. Prefix all upper-case characters with '_' and lower-case them
  3. Split and re-join the resulting string on the _ character to remove any instances of multiple concurrent underscores ("__") and to remove any leading or trailing instances of the character.

For example:

public static string ToSnakeCase(string str)
{
    return str == null
        ? null
        : string.Join("_", string.Concat(string.Join("_", str.Split(new char[] {},
            StringSplitOptions.RemoveEmptyEntries))
            .Select(c => char.IsUpper(c)
                ? $"_{c}".ToLower()
                : $"{c}"))
            .Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries));
}
Rufus L
  • 36,127
  • 5
  • 30
  • 43
  • I haven't tested your's, but this one works on all the cases I threw at it: `Regex.Replace(Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2"), "([a-z0-9])([A-Z])", "$1_$2").ToLower()` – jwatts1980 Oct 06 '22 at 11:10
2

There is a well maintained EF Core community project that implements a number of naming convention rewriters called EFCore.NamingConventions. The rewriters don't have any internal dependencies, so if you don't want to bring in an EF Core related package you can just copy the rewriter code out.

Here is the snake case rewriter: https://github.com/efcore/EFCore.NamingConventions/blob/main/EFCore.NamingConventions/Internal/SnakeCaseNameRewriter.cs

satnhak
  • 9,407
  • 5
  • 63
  • 81
1

pseudo code below. In essence check if each char is upper case, then if it is add a _, then add the char to lower case

var newString = s.subString(0,1).ToLower();
foreach (char c in s.SubString(1,s.length-1))
{
    if (char.IsUpper(c))
    {
        newString = newString + "_";
    }
    newString = newString + c.ToLower();
}
gh9
  • 10,169
  • 10
  • 63
  • 96
  • 6
    You'd have to treat first character differently, as you do not want `_live_karma` with input `LiveKarma`. Also, when *building* a string like this, you might prefer to use a `StringBuilder` – Rafalon Jul 23 '20 at 13:44
1

if you're into micro-optimaizations and want to prevent unneccessary conversions wherever possible, this one might also work:

    public static string ToSnakeCase(this string text)
    {
        static IEnumerable<char> Convert(CharEnumerator e)
        {
            if (!e.MoveNext()) yield break;
            yield return char.ToLower(e.Current);
            while (e.MoveNext())
            {
                if (char.IsUpper(e.Current))
                {
                    yield return '_';
                    yield return char.ToLower(e.Current);
                }
                else
                {
                    yield return e.Current;
                }
            }
        }

        return new string(Convert(text.GetEnumerator()).ToArray());
    }
realbart
  • 3,497
  • 1
  • 25
  • 37
1

May has well toss this one out. Very simple and worked for me.

public static string ToSnakeCase(this string text)
{
    text = Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2");
    text = Regex.Replace(text, "([a-z0-9])([A-Z])", "$1_$2");
    return text.ToLower();
}

Testing it with some samples (borrowed from @GeekInside's answer):

var samples = new List<string>() { "TestSC", "testSC", "TestSnakeCase", "testSnakeCase", "TestSnakeCase123", "_testSnakeCase123", "test_SC" };
var results = new List<string>() { "test_sc", "test_sc", "test_snake_case", "test_snake_case", "test_snake_case123", "_test_snake_case123", "test_sc" };
for (int i = 0; i < samples.Count; i++)
{
    Console.WriteLine("Test success: " + (val.ToSnakeCase() == results[i] ? "true" : "false"));
}

Produced the following output:

Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true
jwatts1980
  • 7,254
  • 2
  • 28
  • 44