332

Example:

>>> convert('CamelCase')
'camel_case'
danijar
  • 32,406
  • 45
  • 166
  • 297
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
  • 28
    To convert in the other direction, see this [other](http://stackoverflow.com/questions/4303492/how-can-i-simplify-this-conversion-from-underscore-to-camelcase-in-python) stackoverflow question. – Nathan Sep 30 '11 at 21:30
  • 10
    n.b. that's `NotCamelCase` but `thisIs` – Matt Richards Jun 23 '14 at 15:59
  • 5
    @MattRichards It is a matter of dispute. [wiki](https://en.wikipedia.org/wiki/CamelCase#Variations_and_synonyms) – Piotr Siupa Aug 05 '15 at 10:38
  • @MattRichards For example in Java they use both, CamelCase is used for naming Class definitions, while camelCase is used for naming initialized variables. – darkless Feb 04 '17 at 00:57

30 Answers30

1153

Camel case to snake case

import re

name = 'CamelCaseName'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name)  # camel_case_name

If you do this many times and the above is slow, compile the regex beforehand:

pattern = re.compile(r'(?<!^)(?=[A-Z])')
name = pattern.sub('_', name).lower()

To handle more advanced cases specially (this is not reversible anymore):

def camel_to_snake(name):
    name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()

print(camel_to_snake('camel2_camel2_case'))  # camel2_camel2_case
print(camel_to_snake('getHTTPResponseCode'))  # get_http_response_code
print(camel_to_snake('HTTPResponseCodeXYZ'))  # http_response_code_xyz

To add also cases with two underscores or more:

def to_snake_case(name):
    name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    name = re.sub('__([A-Z])', r'_\1', name)
    name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', name)
    return name.lower()

Snake case to pascal case

name = 'snake_case_name'
name = ''.join(word.title() for word in name.split('_'))
print(name)  # SnakeCaseName
Jemshit
  • 9,501
  • 5
  • 69
  • 106
epost
  • 51
  • 1
  • 2
  • 4
  • 3
    This solution fails in these cases: _test_Method, __test__Method, _Test, getHTTPresponseCode, __CamelCase, and _Camel_Case. – freegnu May 16 '11 at 14:12
  • @freegnu, Fails how? It appears to work for me for all of those examples. – Cerin Jul 08 '11 at 22:43
  • 8
    how about the reverse? Convert a `not_camel_case` to `notCamelCase` and/or `NotCamelCase`? – john2x Aug 14 '11 at 22:59
  • 13
    To avoid double underscores when converting e.g. camel_Case, add this line: `s2.replace('__', '_')` – Marcus Ahlberg Aug 13 '13 at 11:15
  • Small fix: put `$` at the beginning of the first pattern so it won't prepend `_` at the begninning like so `"Document" -> "_document"` – astronaut Oct 22 '13 at 15:38
  • Just updated your first regex to match 's' for plural form. It enables you to convert floatingIPsAdresses to floating_ips_addresses instead of floating_i_ps_addresses. Hope it can help :p => re.compile('(.)([A-Z](?!s[A-Z])[a-z]+)') – t00f Jul 21 '14 at 21:27
  • 3
    Note this is not very reversible. getHTTPResponseCode should convert to get_h_t_t_p_response_code. getHttpResponseCode should convert to get_http_response_code – K2xL Feb 05 '15 at 20:35
  • Seems to be working flawlessly. Only if there was some explanation given for it.... – Anmol Singh Jaggi Mar 27 '16 at 03:32
  • 4
    @AnmolSinghJaggi The first regex handles the edge case of an acronym followed by another word (e.g. "HTTPResponse" -> "HTTP_Response") OR the more normal case of an initial lowercase word followed by a capitalized word (e.g. "getResponse" -> "get_Response". The second regex handles the normal case of two non-acronyms (e.g. "ResponseCode" -> "Response_Code") followed by a final call to lowercase everything. Thus "getHTTPResponseCode" -> "getHTTP_ResponseCode" -> "get_HTTP_Response_Code" -> "get_http_response_code" – Jeff Moser Apr 19 '16 at 15:12
  • 5
    `convert = lambda name: re.sub('((?!^)(?<!_)[A-Z][a-z]+|(?<=[a-z0-9])[A-Z])', r'_\1', name).lower()` - This handles freegnu's cases correctly. – cco Oct 29 '16 at 02:12
  • Nice code, but why not do it the Pythonic way? Use a library! http://stackoverflow.com/a/17328907/1450294 – Michael Scheper Mar 03 '17 at 15:19
  • In PEP8 standard, class names must be the CapWords convention. So what if we only want to replace phrases if they start with lowercase, as in "x = CamelCase(numberFive + numberSix)" becomes "x = CamelCase(number_five + number_six)". I'm kind of thinking of doing a regex that matches first letter lowercase followed by letters/digits/underscore, then on this regex call your convert() function like: re.sub("\\b([a-z]\w+)", convert(r"\1"), "x = CamelCase(numberFive + numberSix)"). But this doesn't work even though convert("numberFive") works. – snaran Jul 19 '17 at 18:01
  • **train case** would be much better than **snake case**, IMHO – unless you are a snake lover. (Sorry for commenting here, comments to the question are disabled) – Walter Tross May 02 '20 at 09:25
  • 2
    `APIKey` => `a_p_i_key` – vaughan May 08 '20 at 02:35
  • def to_snake(camel_input): camel_input = camel_input.replace('&', '_and_') camel_input = re.sub(r'(?<!^)(?=[A-Z])', '_', camel_input).lower() words = re.findall(r'[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+', camel_input) return '_'.join(words) – Farshid Jan 11 '21 at 14:22
  • Although this is very old question, its worth pointing out this accepted answer is not even producing camel case - its producing pascal case as capatalising first char. – charlie6411 Sep 01 '21 at 15:52
  • Real camelCase implementation: ''.join(word.title() if index_word > 0 else word for index_word, word in enumerate(name.split('_'))) – Jože Ws Dec 14 '21 at 22:46
  • Real **camelCase** implementation: `''.join(word.title() if index_word > 0 else word for index_word, word in enumerate(name.split('_')))` – Jože Ws Dec 14 '21 at 22:48
  • I was looking for a similar solution, to convert strings like RetainUIDs, but this solution seems to fail as `camel_to_snake('RetainUIDs')` returns retain_ui_ds instead of the expected retain_uids – Jonathan Dec 15 '21 at 09:34
  • What about from `ISOCurrencyCode` to `iso_currency_code`? – Shift 'n Tab Jun 28 '22 at 12:47
  • 1
    Why don't use pydash (https://pydash.readthedocs.io/en/latest/)? – Evgenii Jul 21 '22 at 10:40
301

There's an inflection library in the package index that can handle these things for you. In this case, you'd be looking for inflection.underscore():

>>> inflection.underscore('CamelCase')
'camel_case'
Brad Koch
  • 19,267
  • 19
  • 110
  • 137
  • 68
    I dont understand why people are up voting the use of custom functions when there is a great library that performs this task. We should not be reinventing the wheel. – oden Jul 05 '15 at 03:26
  • 182
    @oden Maybe because adding an **entire new dependency** to do the job of a single-line function is fragile over-kill? – Cecil Curry Dec 21 '15 at 06:23
  • 21
    For one instance, sure it's overkill. Across a larger application, no need to reinvent and obfuscate the wheel. – Brad Koch Dec 21 '15 at 22:47
  • 19
    Regexes back a lot into a "single line", which is why it's lot more than one line with proper testing. – studgeek May 10 '17 at 16:38
  • I'm not sure it works great, it doesn't capture all variations: 'TotalWeight': 840.0, 'TrackingNumber': '', 'Vendor': 'NONE', 'address1': 'du mail', 'address2': '', 'address3': '', – Vladimir Stazhilov Jan 22 '18 at 01:12
  • 32
    @CecilCurry: I'm sure you're a great programmer, but I'm not sure there aren't cases that you haven't considered—just look at other answers here for examples. That's why I'll always choose a library, because it's the sum experience of many more devs than just me. – Michael Scheper Sep 27 '18 at 14:55
  • 3
    @MichaelScheper You can always just copy the `inflection` library code: `re.sub(r"([a-z\d])([A-Z])", r'\1_\2', re.sub(r"([A-Z]+)([A-Z][a-z])", r'\1_\2', word)).replace("-", "_").lower()` https://inflection.readthedocs.io/en/latest/_modules/inflection.html#underscore – André C. Andersen Jan 11 '19 at 19:27
  • 1
    @MichaelScheper No offense intended. It was an attempt at a compromise: Use the wisdom of the community, and not get more dependencies than necessary. On a side note, you might consider not using idioms like "why on earth". [It can be interpreted as unfriendly](https://stackoverflow.com/conduct). – André C. Andersen Jan 19 '19 at 23:07
  • 1
    @AndréC.Andersen: No offence intended nor taken, but I am truly baffled by the lengths some people are willing to go to avoid an 80kB library. The reasons may be obvious enough for them to consider sarcastic boldface appropriate (see above), but they're not obvious to me, hence the idiom. A big part of the power of Python is the range of libraries available, and in my many years of writing Python applications on a half-dozen platforms, I've never discovered any disadvantage for using them (except a few 'problem' ones, but the inflection library isn't one). I just want to understand the motive. – Michael Scheper Jan 20 '19 at 22:45
  • @MichaelScheper There are many reasons you might want to limit the number of dependencies. One is bloat, but isn't really a strong argument seeing as it was a small library. Another is dependency [conflicts, etc](https://en.wikipedia.org/wiki/Dependency_hell#Problems). Last time I experienced this was with Airflow insisting on using an old version of Jinja2. I think there is a balance to these things. IMO a one-liner upvoted by 613 people on SO is not worth creating another dependency for. But, this is no science, some things comes down to preference. – André C. Andersen Jan 20 '19 at 23:35
  • @AndréC.Andersen: That's a fair argument in general, and I've had to deal with a couple of problem libraries too. I've just also had to fix a lot of bugs caused by people reinventing the wheel. I don't have anything further to add, except that I'm sorry if anybody was offended by my earlier comment. I don't think it was out of step with the tone created earlier, though. Anyhow, thanks. – Michael Scheper Jan 21 '19 at 00:35
  • I feel you on the "reinventing the wheel" problem. Also, don't worry about the idiom thing, I didn't see it in context of other commenters. Have a nice one. – André C. Andersen Jan 21 '19 at 08:32
  • 1
    @oden a sufficiently upvoted Python answer on StackOveflow is indistinguishable from a library. – Boris Verkhovskiy Aug 06 '19 at 05:21
  • 1
    It has been 3 years since up voting this answer and commenting that it is the best solution, I now feel differently. We are now mindful of adding in new libraries to our codebase and would prefer a custom function for many of the reasons stated above. Maintaining a codebase over many years makes you think differently then writing disposable code or code that someone else lives with... – oden Aug 07 '19 at 08:29
  • 3
    @oden you could just copy paste the entire library, which is [a single file](https://github.com/jpvanhal/inflection/blob/master/inflection.py) into your code base, which is slightly better than copy pasting an answer from stack overflow and never updating. If it makes you feel better, the library also hasn't been updated since 2015. – Boris Verkhovskiy Sep 10 '19 at 21:45
  • 1
    Is it really about the pros and cons of a "one liner" vs. "importing a library"? What about if you need this "one liner" in your code again (and again)? You'll probably start defining (and calling) a function which won't be a "one liner" anymore. And if you need it in different modules (or packages) you'll probably start creating a utils module or package yourself. In those cases I'd tend to consider using a proven library. Specially when converting to e.g. snakecase in your code also needs a converse function to the original case the (camel case, pascal case etc.). – wolfrevo Feb 06 '22 at 10:07
  • Inflection has 0 external dependencies. – webelo Nov 03 '22 at 23:16
  • @oden mainly because every third-party module is a code/security risk. Secondly because of the possible added overhead/dependencies. Thirdly because of the ease of adaption/modification to slightly different use-cases. – Gergely M Mar 14 '23 at 11:08
156

I don't know why these are all so complicating.

for most cases, the simple expression ([A-Z]+) will do the trick

>>> re.sub('([A-Z]+)', r'_\1','CamelCase').lower()
'_camel_case'  
>>> re.sub('([A-Z]+)', r'_\1','camelCase').lower()
'camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camel2Case2').lower()
'camel2_case2'
>>> re.sub('([A-Z]+)', r'_\1','camelCamelCase').lower()
'camel_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'

To ignore the first character simply add look behind (?!^)

>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCase').lower()
'camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCamelCase').lower()
'camel_camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','Camel2Camel2Case').lower()
'camel2_camel2_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'

If you want to separate ALLCaps to all_caps and expect numbers in your string you still don't need to do two separate runs just use | This expression ((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z])) can handle just about every scenario in the book

>>> a = re.compile('((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z]))')
>>> a.sub(r'_\1', 'getHTTPResponseCode').lower()
'get_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponseCode').lower()
'get2_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponse123Code').lower()
'get2_http_response123_code'
>>> a.sub(r'_\1', 'HTTPResponseCode').lower()
'http_response_code'
>>> a.sub(r'_\1', 'HTTPResponseCodeXYZ').lower()
'http_response_code_xyz'

It all depends on what you want so use the solution that best suits your needs as it should not be overly complicated.

nJoy!

nickl-
  • 8,417
  • 4
  • 42
  • 56
  • 1
    The last iteration is the most clever, IMO. It took me a little bit to understand that it's only replacing the single character at the beginning of each word -- and that was only because the approach was different than one I'd come up with myself. Nicely done. – Justin Miller Jun 30 '14 at 12:23
  • 4
    I was puzzled by the `(?!^)` expression being called a look-behind. Unless I'm missing something, what we really want here is a negative look-behind which should be expressed as `(?<!^)`. For reasons I cannot understand your negative look-ahead `(?!^)` seems to work, too... – Apteryx Jul 08 '16 at 19:28
  • 11
    This doesn't handle preexisting underscores well: `"Camel2WARNING_Case_CASE"` becomes `"camel2_warning_case__case"`. You can add a `(?<!_)` negative lookbehind, to solve it: `re.sub('((?<=[a-z0-9])[A-Z]|(?!^)(?<!_)[A-Z](?=[a-z]))', r'_\1', "Camel2WARNING_Case_CASE").lower()` returns `'camel2_warning_case_case'` – luckydonald Jun 20 '17 at 12:55
  • 3
    @Apteryx You're right, `(?!^)` was incorrectly called a "look behind" and should have instead been called a _negative lookahead assertion_. As [this nice explanation](https://stackoverflow.com/a/2973495/7232335) shows, negative lookaheads usually come _after_ the expression you're searching for. So you can think of `(?!^)` as "find `''` where `` does not follow". Indeed, a negative lookbehind also works: you can think of `(?<!^)` as "find `''` where `` does not precede". – Nathaniel Jones Jan 09 '19 at 18:42
88

Avoiding libraries and regular expressions:

def camel_to_snake(s):
    return ''.join(['_'+c.lower() if c.isupper() else c for c in s]).lstrip('_')
>>> camel_to_snake('ThisIsMyString')
'this_is_my_string'
otocan
  • 824
  • 11
  • 16
  • 8
    This is the most compact one which avoids using the `re` library and doing the stuff only in one line only using built-in str.methods! It is similar to [this answer](https://stackoverflow.com/a/28774760/2648551), but avoids using slicing and additional `if ... else` by simply stripping potentially added "_" as first character. I like this most. – colidyre Mar 07 '20 at 01:00
  • 4
    For accepted answer `6.81 µs ± 22.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)` but for this response `2.51 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)` which is 2.5x times faster! Love this! – WBAR Mar 20 '20 at 17:10
  • 5
    this will convert acronyms like "MultinomialNB" to "multinomal_n_b" instead of "multinomial_nb. – s2t2 Sep 06 '20 at 00:33
  • 2
    URL -> u_r_l, HTTP -> h_t_t_p (I realize I'm piling on a bit here but ...) – 0xbe5077ed Dec 17 '20 at 16:17
  • 1
    @WBAR it might be faster but there are tons of case it doesn't handle... examples: `HIThereHOWIsItGoing, how_are_YoU_TeST`. Though I agree it does do regular camel case to snake case well enough – rv.kvetch Jul 27 '21 at 06:27
  • @rv.kvetch how do you propose "HIThereHOWIsItGoing" could be turned from camel case to snake case? That is not in any sensible format. – mjuopperi Nov 01 '21 at 09:38
  • Fair point, maybe something like "hiThereHOWIsItGoing" which I believe should be camel case. – rv.kvetch Nov 01 '21 at 14:27
  • Here are a few adaptations to make it a bit more general: `lambda string: "".join(char if not char.isupper() else ("" if i and (substr := string[i-1] + (next_char or "")) == substr.upper() else "_") + char.lower() for i, (char, next_char) in enumerate(zip(string, [*string[1:], None]))).lstrip("_")`. This update is submitted as a lambda function since we cannot add newlines in this comment section. Works fine for cases like `"HelloWorld"`, `HelloSTACKOverflow`, and `HelloSTACKOverflowOMG` when using python 3.10+ – Rodrigo Hernández Mota Mar 20 '23 at 00:57
42

stringcase is my go-to library for this; e.g.:

>>> from stringcase import pascalcase, snakecase
>>> snakecase('FooBarBaz')
'foo_bar_baz'
>>> pascalcase('foo_bar_baz')
'FooBarBaz'
Beau
  • 11,267
  • 8
  • 44
  • 37
  • For `hello world` it will add double `_` as `hello__world` that is not good – Gonzalo Garcia May 26 '21 at 16:45
  • 1
    @GonzaloGarcia this is incorrect: `stringcase.snakecase('hello world')` returns `'hello_world'` (one underscore) – Beau May 26 '21 at 17:05
  • sorry, if the string is `hello world` it will add double `_` – Gonzalo Garcia May 26 '21 at 19:13
  • 1
    it's up to the user to pass correct arguments to the library :) – Beau May 27 '21 at 02:42
  • Sometimes the user doesn't have control over the input of functions. Robust libraries could handle better. – Gonzalo Garcia May 28 '21 at 16:42
  • 3
    double underscores are commonly used in Python; your "better" is someone else's "worse"; if you want a library with different semantics you can use one; your criticism is not part of the original question which is about CamelCase, not spaces – Beau May 29 '21 at 07:05
13

I think this solution is more straightforward than previous answers:

import re

def convert (camel_input):
    words = re.findall(r'[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+', camel_input)
    return '_'.join(map(str.lower, words))


# Let's test it
test_strings = [
    'CamelCase',
    'camelCamelCase',
    'Camel2Camel2Case',
    'getHTTPResponseCode',
    'get200HTTPResponseCode',
    'getHTTP200ResponseCode',
    'HTTPResponseCode',
    'ResponseHTTP',
    'ResponseHTTP2',
    'Fun?!awesome',
    'Fun?!Awesome',
    '10CoolDudes',
    '20coolDudes'
]
for test_string in test_strings:
    print(convert(test_string))

Which outputs:

camel_case
camel_camel_case
camel_2_camel_2_case
get_http_response_code
get_200_http_response_code
get_http_200_response_code
http_response_code
response_http
response_http_2
fun_awesome
fun_awesome
10_cool_dudes
20_cool_dudes

The regular expression matches three patterns:

  1. [A-Z]?[a-z]+: Consecutive lower-case letters that optionally start with an upper-case letter.
  2. [A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$): Two or more consecutive upper-case letters. It uses a lookahead to exclude the last upper-case letter if it is followed by a lower-case letter.
  3. \d+: Consecutive numbers.

By using re.findall we get a list of individual "words" that can be converted to lower-case and joined with underscores.

rspeed
  • 1,612
  • 17
  • 21
  • 1
    There is a good example here to get the Numerics tokenized independantly. – math_law Apr 30 '18 at 15:34
  • 4
    Broken: convert("aB") -> 'a' – adw Dec 02 '19 at 11:03
  • 1
    @adw This regex covers that case: `r"[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+|[A-Z]{2,}|[A-Z]$"` – abstrus Mar 10 '21 at 04:18
  • this works and it's actually the fastest version i could find in this thread tbh. the version in the comments supports all remaining edge cases too i think. – rv.kvetch Jun 05 '22 at 13:43
  • actually, I spoke too soon. does anyone know a regex that can handle words like `H@Y`, `H@y`, `WhaTSuP!`, or `B!KE`? – rv.kvetch Jun 05 '22 at 14:45
  • I found my fix. It looks like I just needed to remove the last `$` in the comment above. Also, this version handles words with numbers, like `t3st` or special cases with leading numbers, like `1nfinity` which was something I needed to account for in my code. The reason is that this last case isn't a valid identifier in python. Here's the regex to cover these edge cases: `r'\d[A-Z]+|[A-Z]?[a-z\d]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+|[A-Z]{2,}|[A-Z]'` – rv.kvetch Jun 05 '22 at 16:10
8

Personally I am not sure how anything using regular expressions in python can be described as elegant. Most answers here are just doing "code golf" type RE tricks. Elegant coding is supposed to be easily understood.

def to_snake_case(not_snake_case):
    final = ''
    for i in xrange(len(not_snake_case)):
        item = not_snake_case[i]
        if i < len(not_snake_case) - 1:
            next_char_will_be_underscored = (
                not_snake_case[i+1] == "_" or
                not_snake_case[i+1] == " " or
                not_snake_case[i+1].isupper()
            )
        if (item == " " or item == "_") and next_char_will_be_underscored:
            continue
        elif (item == " " or item == "_"):
            final += "_"
        elif item.isupper():
            final += "_"+item.lower()
        else:
            final += item
    if final[0] == "_":
        final = final[1:]
    return final

>>> to_snake_case("RegularExpressionsAreFunky")
'regular_expressions_are_funky'

>>> to_snake_case("RegularExpressionsAre Funky")
'regular_expressions_are_funky'

>>> to_snake_case("RegularExpressionsAre_Funky")
'regular_expressions_are_funky'
kevlarr
  • 1,070
  • 12
  • 24
TehTris
  • 3,139
  • 1
  • 21
  • 33
  • 2
    `+=` on strings is almost always a bad idea. Append to a list and `''.join()` it in the end. Or in this case, simply join it with an underscore... – ThiefMaster Sep 25 '15 at 11:53
  • 39
    How is a single-line regular expression _not_ innately superior in just about every practical way (including readability) to inefficient multi-line character iteration and brute-force string munging? Python provides regular expression support out-of-the-box for a reason. – Cecil Curry Dec 21 '15 at 06:26
  • regular expression are **good** but I think that when we can do a simple function to replace it, it's better. – math2001 Nov 22 '16 at 07:44
  • 1
    @CecilCurry - Regular expressions are VERY complex. See the compiler and parser that Python uses: https://svn.python.org/projects/python/trunk/Lib/sre_compile.py & https://svn.python.org/projects/python/trunk/Lib/sre_parse.py -- Simple string manipulation like this likely much faster than an RE doing the same. – Evan Borgstrom Feb 27 '17 at 19:26
  • 1
    +1. Regexes can be a real CPU sink, and on intensive calculations will dramatically lower your performances. For simple tasks, always prefer simple functions. – Fabien Jul 04 '17 at 22:03
  • 7
    "For simple tasks, always prefer simple functions" is definitely good advice, but this answer is neither a simple function nor an elegant one. Regex might be slower, but defaulting to a complicated function like this (that is ALSO untested and has numerous potential points of error) is completely premature optimization – kevlarr Sep 04 '18 at 14:31
  • 2
    I'm sorry but this function is very hard to read compared to a simple regex. – mjuopperi Nov 01 '21 at 09:40
  • Down-voted this answer because it is not at elegant or sophisticated; rather verbose and not performant. I am not sure why the author says... "Elegant coding is supposed to be easily understood." The code uses bad practices like "+=" operators for strings. Also, xrange() is from Python 2.7, which was retired in January 2020. – Rich Lysakowski PhD Aug 23 '22 at 05:18
7
''.join('_'+c.lower() if c.isupper() else c for c in "DeathToCamelCase").strip('_')
re.sub("(.)([A-Z])", r'\1_\2', 'DeathToCamelCase').lower()
Jimmy
  • 89,068
  • 17
  • 119
  • 137
4

Here's my solution:

def un_camel(text):
    """ Converts a CamelCase name into an under_score name. 

        >>> un_camel('CamelCase')
        'camel_case'
        >>> un_camel('getHTTPResponseCode')
        'get_http_response_code'
    """
    result = []
    pos = 0
    while pos < len(text):
        if text[pos].isupper():
            if pos-1 > 0 and text[pos-1].islower() or pos-1 > 0 and \
            pos+1 < len(text) and text[pos+1].islower():
                result.append("_%s" % text[pos].lower())
            else:
                result.append(text[pos].lower())
        else:
            result.append(text[pos])
        pos += 1
    return "".join(result)

It supports those corner cases discussed in the comments. For instance, it'll convert getHTTPResponseCode to get_http_response_code like it should.

Evan Fosmark
  • 98,895
  • 36
  • 105
  • 117
4

I don't get idea why using both .sub() calls? :) I'm not regex guru, but I simplified function to this one, which is suitable for my certain needs, I just needed a solution to convert camelCasedVars from POST request to vars_with_underscore:

def myFunc(...):
  return re.sub('(.)([A-Z]{1})', r'\1_\2', "iTriedToWriteNicely").lower()

It does not work with such names like getHTTPResponse, cause I heard it is bad naming convention (should be like getHttpResponse, it's obviously, that it's much easier memorize this form).

desper4do
  • 1
  • 2
  • 1
  • I forgot to mention, that '{1}' is not needed, but sometimes it help clarify some mist. – desper4do Nov 13 '12 at 16:22
  • 2
    -1: this just doesn't work. Try with for example with `'HTTPConnectionFactory'`, your code produces `'h_tt_pconnection_factory'`, code from accepted answer produces `'http_connection_factory'` – vartec May 23 '13 at 09:57
3

For the fun of it:

>>> def un_camel(input):
...     output = [input[0].lower()]
...     for c in input[1:]:
...             if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
...                     output.append('_')
...                     output.append(c.lower())
...             else:
...                     output.append(c)
...     return str.join('', output)
...
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'

Or, more for the fun of it:

>>> un_camel = lambda i: i[0].lower() + str.join('', ("_" + c.lower() if c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" else c for c in i[1:]))
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'
gahooa
  • 131,293
  • 12
  • 98
  • 101
  • 3
    c.isupper() rather than c in ABCEF...Z – Jimmy Jul 24 '09 at 01:30
  • 1
    Python doesn't have regexes? A quick 's/[a-z]\K([A-Z][a-z])/_\L$1/g; lc $_' in Perl does the job (although it does not handle getHTTPResponseCode well; but that's expected, that should be named getHttpResponseCode) – jrockway Jul 24 '09 at 01:34
  • 5
    `str.join` has been deprecated for _ages_. Use `''.join(..)` instead. – John Fouhy Jul 24 '09 at 01:49
  • jrockway: It does have regular expressions, via the "re" module. It shouldn't be too difficult to make this work using regex rather than the approaches posted here. – Matthew Iselin Jul 24 '09 at 01:52
  • Python noob here, but why return str.join('', output)? Just to create a copy? – Tarks Jul 24 '09 at 02:00
  • Tarks, `output` is a list of characters, not a string. So running it through str.join with an empty string as the first parameter will compress the list into a string. – Evan Fosmark Jul 24 '09 at 02:17
  • @John: Why do you say that is deprecated? str.join("", []) is exactly the same as calling "".join([]). Bound vs. unbound. Same as calling ParentClass.__init__(self). However, I find the first more readable. Can you point to any docs on the depreciation? – gahooa Jul 24 '09 at 15:03
  • I like this approach, but why go via a list when you can just build the output as a string? – Robin Andrews May 12 '20 at 15:56
3

Using regexes may be the shortest, but this solution is way more readable:

def to_snake_case(s):
    snake = "".join(["_"+c.lower() if c.isupper() else c for c in s])
    return snake[1:] if snake.startswith("_") else snake
3k-
  • 2,467
  • 2
  • 23
  • 24
  • @blueyed that's completely unrelated, this question has nothing to do with django. – 3k- Apr 20 '15 at 09:51
  • It's just an example, like HTTPResponseCode, which is handled by http://stackoverflow.com/a/23561109/15690. – blueyed Apr 22 '15 at 19:43
2

This is not a elegant method, is a very 'low level' implementation of a simple state machine (bitfield state machine), possibly the most anti pythonic mode to resolve this, however re module also implements a too complex state machine to resolve this simple task, so i think this is a good solution.

def splitSymbol(s):
    si, ci, state = 0, 0, 0 # start_index, current_index 
    '''
        state bits:
        0: no yields
        1: lower yields
        2: lower yields - 1
        4: upper yields
        8: digit yields
        16: other yields
        32 : upper sequence mark
    '''
    for c in s:

        if c.islower():
            if state & 1:
                yield s[si:ci]
                si = ci
            elif state & 2:
                yield s[si:ci - 1]
                si = ci - 1
            state = 4 | 8 | 16
            ci += 1

        elif c.isupper():
            if state & 4:
                yield s[si:ci]
                si = ci
            if state & 32:
                state = 2 | 8 | 16 | 32
            else:
                state = 8 | 16 | 32

            ci += 1

        elif c.isdigit():
            if state & 8:
                yield s[si:ci]
                si = ci
            state = 1 | 4 | 16
            ci += 1

        else:
            if state & 16:
                yield s[si:ci]
            state = 0
            ci += 1  # eat ci
            si = ci   
        print(' : ', c, bin(state))
    if state:
        yield s[si:ci] 


def camelcaseToUnderscore(s):
    return '_'.join(splitSymbol(s)) 

splitsymbol can parses all case types: UpperSEQUENCEInterleaved, under_score, BIG_SYMBOLS and cammelCasedMethods

I hope it is useful

jdavidls
  • 348
  • 1
  • 5
2

Take a look at the excellent Schematics lib

https://github.com/schematics/schematics

It allows you to created typed data structures that can serialize/deserialize from python to Javascript flavour, eg:

class MapPrice(Model):
    price_before_vat = DecimalType(serialized_name='priceBeforeVat')
    vat_rate = DecimalType(serialized_name='vatRate')
    vat = DecimalType()
    total_price = DecimalType(serialized_name='totalPrice')
Iain Hunter
  • 4,319
  • 1
  • 27
  • 13
2

So many complicated methods... Just find all "Titled" group and join its lower cased variant with underscore.

>>> import re
>>> def camel_to_snake(string):
...     groups = re.findall('([A-z0-9][a-z]*)', string)
...     return '_'.join([i.lower() for i in groups])
...
>>> camel_to_snake('ABCPingPongByTheWay2KWhereIsOurBorderlands3???')
'a_b_c_ping_pong_by_the_way_2_k_where_is_our_borderlands_3'

If you don't want make numbers like first character of group or separate group - you can use ([A-z][a-z0-9]*) mask.

unitto
  • 23
  • 5
1

A horrendous example using regular expressions (you could easily clean this up :) ):

def f(s):
    return s.group(1).lower() + "_" + s.group(2).lower()

p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(f, "CamelCase")
print p.sub(f, "getHTTPResponseCode")

Works for getHTTPResponseCode though!

Alternatively, using lambda:

p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "CamelCase")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "getHTTPResponseCode")

EDIT: It should also be pretty easy to see that there's room for improvement for cases like "Test", because the underscore is unconditionally inserted.

Matthew Iselin
  • 10,400
  • 4
  • 51
  • 62
1

This simple method should do the job:

import re

def convert(name):
    return re.sub(r'([A-Z]*)([A-Z][a-z]+)', lambda x: (x.group(1) + '_' if x.group(1) else '') + x.group(2) + '_', name).rstrip('_').lower()
  • We look for capital letters that are precedeed by any number of (or zero) capital letters, and followed by any number of lowercase characters.
  • An underscore is placed just before the occurence of the last capital letter found in the group, and one can be placed before that capital letter in case it is preceded by other capital letters.
  • If there are trailing underscores, remove those.
  • Finally, the whole result string is changed to lower case.

(taken from here, see working example online)

Mathieu Rodic
  • 6,637
  • 2
  • 43
  • 49
1

Lightely adapted from https://stackoverflow.com/users/267781/matth who use generators.

def uncamelize(s):
    buff, l = '', []
    for ltr in s:
        if ltr.isupper():
            if buff:
                l.append(buff)
                buff = ''
        buff += ltr
    l.append(buff)
    return '_'.join(l).lower()
Community
  • 1
  • 1
Salvatore
  • 61
  • 4
0

I was looking for a solution to the same problem, except that I needed a chain; e.g.

"CamelCamelCamelCase" -> "Camel-camel-camel-case"

Starting from the nice two-word solutions here, I came up with the following:

"-".join(x.group(1).lower() if x.group(2) is None else x.group(1) \
         for x in re.finditer("((^.[^A-Z]+)|([A-Z][^A-Z]+))", "stringToSplit"))

Most of the complicated logic is to avoid lowercasing the first word. Here's a simpler version if you don't mind altering the first word:

"-".join(x.group(1).lower() for x in re.finditer("(^[^A-Z]+|[A-Z][^A-Z]+)", "stringToSplit"))

Of course, you can pre-compile the regular expressions or join with underscore instead of hyphen, as discussed in the other solutions.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
0

Concise without regular expressions, but HTTPResponseCode=> httpresponse_code:

def from_camel(name):
    """
    ThisIsCamelCase ==> this_is_camel_case
    """
    name = name.replace("_", "")
    _cas = lambda _x : [_i.isupper() for _i in _x]
    seq = zip(_cas(name[1:-1]), _cas(name[2:]))
    ss = [_x + 1 for _x, (_i, _j) in enumerate(seq) if (_i, _j) == (False, True)]
    return "".join([ch + "_" if _x in ss else ch for _x, ch in numerate(name.lower())])
Dantalion
  • 323
  • 1
  • 2
  • 7
0

Without any library :

def camelify(out):
    return (''.join(["_"+x.lower() if i<len(out)-1 and x.isupper() and out[i+1].islower()
         else x.lower()+"_" if i<len(out)-1 and x.islower() and out[i+1].isupper()
         else x.lower() for i,x in enumerate(list(out))])).lstrip('_').replace('__','_')

A bit heavy, but

CamelCamelCamelCase ->  camel_camel_camel_case
HTTPRequest         ->  http_request
GetHTTPRequest      ->  get_http_request
getHTTPRequest      ->  get_http_request
0

Just in case someone needs to transform a complete source file, here is a script that will do it.

# Copy and paste your camel case code in the string below
camelCaseCode ="""
    cv2.Matx33d ComputeZoomMatrix(const cv2.Point2d & zoomCenter, double zoomRatio)
    {
      auto mat = cv2.Matx33d::eye();
      mat(0, 0) = zoomRatio;
      mat(1, 1) = zoomRatio;
      mat(0, 2) = zoomCenter.x * (1. - zoomRatio);
      mat(1, 2) = zoomCenter.y * (1. - zoomRatio);
      return mat;
    }
"""

import re
def snake_case(name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()

def lines(str):
    return str.split("\n")

def unlines(lst):
    return "\n".join(lst)

def words(str):
    return str.split(" ")

def unwords(lst):
    return " ".join(lst)

def map_partial(function):
    return lambda values : [  function(v) for v in values]

import functools
def compose(*functions):
    return functools.reduce(lambda f, g: lambda x: f(g(x)), functions, lambda x: x)

snake_case_code = compose(
    unlines ,
    map_partial(unwords),
    map_partial(map_partial(snake_case)),
    map_partial(words),
    lines
)
print(snake_case_code(camelCaseCode))
Pascal T.
  • 3,866
  • 4
  • 33
  • 36
0

Wow I just stole this from django snippets. ref http://djangosnippets.org/snippets/585/

Pretty elegant

camelcase_to_underscore = lambda str: re.sub(r'(?<=[a-z])[A-Z]|[A-Z](?=[^A-Z])', r'_\g<0>', str).lower().strip('_')

Example:

camelcase_to_underscore('ThisUser')

Returns:

'this_user'

REGEX DEMO

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
brianray
  • 1,139
  • 1
  • 12
  • 16
0

Very nice RegEx proposed on this site:

(?<!^)(?=[A-Z])

If python have a String Split method, it should work...

In Java:

String s = "loremIpsum";
words = s.split("(?&#60;!^)(?=[A-Z])");
Mathieu Rodic
  • 6,637
  • 2
  • 43
  • 49
Jmini
  • 9,189
  • 2
  • 55
  • 77
  • Unfortunately, the Python regular expression module doesn't (as of version 3.6) support splitting on zero-length matches. – rspeed Oct 05 '17 at 02:25
0

Here's something I did to change the headers on a tab-delimited file. I'm omitting the part where I only edited the first line of the file. You could adapt it to Python pretty easily with the re library. This also includes separating out numbers (but keeps the digits together). I did it in two steps because that was easier than telling it not to put an underscore at the start of a line or tab.

Step One...find uppercase letters or integers preceded by lowercase letters, and precede them with an underscore:

Search:

([a-z]+)([A-Z]|[0-9]+)

Replacement:

\1_\l\2/

Step Two...take the above and run it again to convert all caps to lowercase:

Search:

([A-Z])

Replacement (that's backslash, lowercase L, backslash, one):

\l\1
Joe Tricarico
  • 363
  • 2
  • 5
-1

If you use Google's (nearly) deterministic Camel case algorithm, then one does not need to handle things like HTMLDocument since it should be HtmlDocument, then this regex based approach is simple. It replace all capitals or numbers with an underscore. Note does not handle multi digit numbers.

import re

def to_snake_case(camel_str):
    return re.sub('([A-Z0-9])', r'_\1', camel_str).lower().lstrip('_')
run_the_race
  • 1,344
  • 2
  • 36
  • 62
codekoala
  • 715
  • 11
  • 17
-1

Not in the standard library, but I found this module that appears to contain the functionality you need.

Smart Manoj
  • 5,230
  • 4
  • 34
  • 59
Stefano Borini
  • 138,652
  • 96
  • 297
  • 431
  • 2
    No this is an empty package that does nothing. I would love for you to prove me wrong though with a code example. – CornSmith Jun 16 '21 at 12:52
  • This repo looks completely unmaintained in years. Also it's a whole script that appears to operate on files and doesn't even use argparse. – DeusXMachina Jan 25 '22 at 16:35
-1
def convert(name):
    return reduce(
        lambda x, y: x + ('_' if y.isupper() else '') + y, 
        name
    ).lower()

And if we need to cover a case with already-un-cameled input:

def convert(name):
    return reduce(
        lambda x, y: x + ('_' if y.isupper() and not x.endswith('_') else '') + y, 
        name
    ).lower()
dmrz
  • 2,243
  • 1
  • 16
  • 9
-2
def convert(camel_str):
    temp_list = []
    for letter in camel_str:
        if letter.islower():
            temp_list.append(letter)
        else:
            temp_list.append('_')
            temp_list.append(letter)
    result = "".join(temp_list)
    return result.lower()
JohnBoy
  • 123
  • 1
  • 2
  • 11
-5

Use: str.capitalize() to convert first letter of the string (contained in variable str) to a capital letter and returns the entire string.

Example: Command: "hello".capitalize() Output: Hello

Arshin
  • 1,050
  • 1
  • 9
  • 10