How can I simplify this conversion from underscore to camelcase in Python?

Question

I have written the function below that converts underscore to camelcase with first word in lowercase, i.e. "get_this_value" -> "getThisValue". Also I have requirement to preserve leading and trailing underscores and also double (triple etc.) underscores, if any, i.e.

"_get__this_value_" -> "_get_ThisValue_".

The code:

def underscore_to_camelcase(value):
    output = ""
    first_word_passed = False
    for word in value.split("_"):
        if not word:
            output += "_"
            continue
        if first_word_passed:
            output += word.capitalize()
        else:
            output += word.lower()
        first_word_passed = True
    return output

I am feeling the code above as written in non-Pythonic style, though it works as expected, so looking how to simplify the code and write it using list comprehensions etc.

your question in unclear; do you want to convert "get_this_value" into "getThisValue" or "_get_ThisValue_" — tuergeist, Nov 29 '10 at 12:05
"get_this_value" -> "getThisValue", the first variant. No underscores unless they are at the beginning, at the end or there are two or more underscores following each other. — Serge Tarkovski, Nov 29 '10 at 12:20
Are you sure your function works as expected? You're saying that you want to *preserve* double, triple... underscores, however what your code is doing is cutting one underscore off. I.e. from get__this_value, you get get_ThisValue and from get____this, there's get___This. Is this what you want? — ecik, Nov 29 '10 at 13:55
See related question: http://stackoverflow.com/questions/1175208/does-the-python-standard-library-have-function-to-convert-camelcase-to-camel-case — Adam Schmideg, Jan 02 '11 at 00:50

score 66 · Answer 1 · answered Jun 21 '11 at 13:02

66

This one works except for leaving the first word as lowercase.

def convert(word):
    return ''.join(x.capitalize() or '_' for x in word.split('_'))

(I know this isn't exactly what you asked for, and this thread is quite old, but since it's quite prominent when searching for such conversions on Google I thought I'd add my solution in case it helps anyone else).

answered Jun 21 '11 at 13:02

Siegfried Gevatter

3,624
3
18
13

3

This is excellent. For the novice Python user, if you have words separated by any one symbol (in my case, a space) you can replace the `_` in both locations to use this functionality to give you camel case in other situations. +1 – BlackVegetable Feb 02 '13 at 02:33
7

`return word.split('_')[0] + ''.join(x.capitalize() or '_' for x in word.split('_')[1:)` – Timothy Aaron Jun 06 '16 at 15:25
@TimothyAaron syntax fix -- `return word.split('_')[0] + ''.join(x.capitalize() or '_' for x in word.split('_')[1:])` -- missing the final `]` – Alvin Aug 19 '19 at 23:58
Why is `or '_'` needed? – user2340939 Jan 11 '22 at 15:43
I think for python 3 you need '_'.join(...) and not the or '_'; the or is only used if you have the capitalization fail, as in, this word cannot be capitalized because it is empty, which would introduce a joining _, but if you have multiple underscores in a row, then the or '_' will give 2 '_' for each successive '_' less 1. That's why '_'.join is better and a '_' before the '_'.join(...) to connect the first, not-capitalized element to the rest. – TimeHorse Jun 22 '22 at 20:56

David Webb · Accepted Answer · 2010-11-30T09:38:52.073

33

Your code is fine. The problem I think you're trying to solve is that if first_word_passed looks a little bit ugly.

One option for fixing this is a generator. We can easily make this return one thing for first entry and another for all subsequent entries. As Python has first-class functions we can get the generator to return the function we want to use to process each word.

We then just need to use the conditional operator so we can handle the blank entries returned by double underscores within a list comprehension.

So if we have a word we call the generator to get the function to use to set the case, and if we don't we just use _ leaving the generator untouched.

def underscore_to_camelcase(value):
    def camelcase(): 
        yield str.lower
        while True:
            yield str.capitalize

    c = camelcase()
    return "".join(c.next()(x) if x else '_' for x in value.split("_"))

edited Nov 30 '10 at 09:38

answered Nov 29 '10 at 18:25

David Webb

190,537
57
313
299

Nice but FWIW str.lower and str.capitalize don't require an import. – freegnu Nov 29 '10 at 20:45
@freegnu - Thanks. Have updated the code to use `str.lower` and `str.capitalize`. – David Webb Nov 30 '10 at 09:39
3

Using `str.` makes it not work with unicode objects. Try replacing `str` with `type(value)`. – Sam Hartsfield Mar 12 '12 at 15:57

score 25 · Answer 3 · answered Jun 11 '12 at 17:40

25

I prefer a regular expression, personally. Here's one that is doing the trick for me:

import re
def to_camelcase(s):
    return re.sub(r'(?!^)_([a-zA-Z])', lambda m: m.group(1).upper(), s)

Using unutbu's tests:

tests = [('get__this_value', 'get_ThisValue'),
         ('_get__this_value', '_get_ThisValue'),
         ('_get__this_value_', '_get_ThisValue_'),
         ('get_this_value', 'getThisValue'),
         ('get__this__value', 'get_This_Value')]

for test, expected in tests:
    assert to_camelcase(test) == expected

answered Jun 11 '12 at 17:40

codekoala

715
11
17

1

This is a nice solution, as it does not mess up strings that are already in camel case. – ishmael Sep 08 '13 at 21:22
It changed **SCRIPT_CUSTOM** to **SCRIPTCUSTOM** why ? everything else seems working fine – Haseeb Mir Aug 26 '21 at 18:36
Technically that's working as expected: it remove the underscore and ensured that the following letter was capitalized. If you're looking for `SCRIPT_CUSTOM` to become `scriptCustom`, simply pass it in using `.lower()`: `to_camelcase('SCRIPT_CUSTOM'.lower())` – codekoala Aug 26 '21 at 22:39

Cari · Answer 4 · 2015-04-13T15:02:33.870

Here's a simpler one. Might not be perfect for all situations, but it meets my requirements, since I'm just converting python variables, which have a specific format, to camel-case. This does capitalize all but the first word.

def underscore_to_camelcase(text):
"""
Converts underscore_delimited_text to camelCase.
Useful for JSON output
"""
    return ''.join(word.title() if i else word for i, word in enumerate(text.split('_')))

score 3 · Answer 5 · answered Sep 09 '13 at 12:15

This algorithm performs well with digit:

import re

PATTERN = re.compile(r'''
    (?<!\A) # not at the start of the string
    _
    (?=[a-zA-Z]) # followed by a letter
    ''', re.X)

def camelize(value):
    tokens = PATTERN.split(value)
    response = tokens.pop(0).lower()
    for remain in tokens:
        response += remain.capitalize()
    return response

Examples:

>>> camelize('Foo')
'foo'
>>> camelize('_Foo')
'_foo'
>>> camelize('Foo_')
'foo_'
>>> camelize('Foo_Bar')
'fooBar'
>>> camelize('Foo__Bar')
'foo_Bar'
>>> camelize('9')
'9'
>>> camelize('9_foo')
'9Foo'
>>> camelize('foo_9')
'foo_9'
>>> camelize('foo_9_bar')
'foo_9Bar'
>>> camelize('foo__9__bar')
'foo__9_Bar'

score 3 · Answer 6 · answered Dec 16 '15 at 16:06

Here's mine, relying mainly on list comprehension, split, and join. Plus optional parameter to use different delimiter:

def underscore_to_camel(in_str, delim="_"):
    chunks = in_str.split(delim)
    chunks[1:] = [_.title() for _ in chunks[1:]]
    return "".join(chunks)

Also, for sake of completeness, including what was referenced earlier as solution from another question as the reverse (NOT my own code, just repeating for easy reference):

first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def camel_to_underscore(in_str):
    s1 = first_cap_re.sub(r'\1_\2', name)
    return all_cap_re.sub(r'\1_\2', s1).lower()

Gareth Rees · Answer 7 · 2010-11-29T12:54:20.967

I think the code is fine. You've got a fairly complex specification, so if you insist on squashing it into the Procrustean bed of a list comprehension, then you're likely to harm the clarity of the code.

The only changes I'd make would be:

To use the join method to build the result in O(n) space and time, rather than repeated applications of += which is O(n²).
To add a docstring.

Like this:

def underscore_to_camelcase(s):
    """Take the underscore-separated string s and return a camelCase
    equivalent.  Initial and final underscores are preserved, and medial
    pairs of underscores are turned into a single underscore."""
    def camelcase_words(words):
        first_word_passed = False
        for word in words:
            if not word:
                yield "_"
                continue
            if first_word_passed:
                yield word.capitalize()
            else:
                yield word.lower()
            first_word_passed = True
    return ''.join(camelcase_words(s.split('_')))

Depending on the application, another change I would consider making would be to memoize the function. I presume you're automatically translating source code in some way, and you expect the same names to occur many times. So you might as well store the conversion instead of re-computing it each time. An easy way to do that would be to use the @memoized decorator from the Python decorator library.

score 2 · Answer 8 · edited Nov 29 '10 at 12:29

I agree with Gareth that the code is ok. However, if you really want a shorter, yet readable approach you could try something like this:

def underscore_to_camelcase(value):
    # Make a list of capitalized words and underscores to be preserved
    capitalized_words = [w.capitalize() if w else '_' for w in value.split('_')]

    # Convert the first word to lowercase
    for i, word in enumerate(capitalized_words):
        if word != '_':
            capitalized_words[i] = word.lower()
            break

    # Join all words to a single string and return it
    return "".join(capitalized_words)

score 2 · Answer 9 · answered Nov 29 '10 at 13:18

The problem calls for a function that returns a lowercase word the first time, but capitalized words afterwards. You can do that with an if clause, but then the if clause has to be evaluated for every word. An appealing alternative is to use a generator. It can return one thing on the first call, and something else on successive calls, and it does not require as many ifs.

def lower_camelcase(seq):
    it=iter(seq)
    for word in it:
        yield word.lower()
        if word.isalnum(): break
    for word in it:
        yield word.capitalize()

def underscore_to_camelcase(text):
    return ''.join(lower_camelcase(word if word else '_' for word in text.split('_')))

Here is some test code to show that it works:

tests=[('get__this_value','get_ThisValue'),
       ('_get__this_value','_get_ThisValue'),
       ('_get__this_value_','_get_ThisValue_'),
       ('get_this_value','getThisValue'),        
       ('get__this__value','get_This_Value'),        
       ]
for test,answer in tests:
    result=underscore_to_camelcase(test)
    try:
        assert result==answer
    except AssertionError:
        print('{r!r} != {a!r}'.format(r=result,a=answer))

score 1 · Answer 10 · answered Nov 29 '10 at 13:41

Here is a list comprehension style generator expression.

from itertools import count
def underscore_to_camelcase(value):
    words = value.split('_')
    counter = count()
    return ''.join('_' if w == '' else w.capitalize() if counter.next() else w for w in words )

score 1 · Answer 11 · answered Nov 22 '20 at 08:41

    def convert(word):
        if not isinstance(word, str):
            return word
        if word.startswith("_"):
            word = word[1:]
        words = word.split("_")
        _words = []
        for idx, _word in enumerate(words):
            if idx == 0:
                _words.append(_word)
                continue
            _words.append(_word.capitalize())
        return ''.join(_words)

score 0 · Answer 12 · answered Nov 29 '10 at 12:46

0

This is the most compact way to do it:

def underscore_to_camelcase(value):
    words = [word.capitalize() for word in value.split('_')]
    words[0]=words[0].lower()
    return "".join(words)

answered Nov 29 '10 at 12:46

vonPetrushev

5,457
6
39
51

This fails to meet the spec! (For `_get_this_value` it returns `GetThisValue` whereas `_getThisValue` is wanted.) – Gareth Rees Nov 29 '10 at 12:49

Jocelyn delalande · Answer 13 · 2010-11-29T21:53:08.217

0

For regexp sake !

import re

def underscore_to_camelcase(value):
    def rep(m):
        if m.group(1) != None:
            return m.group(2) + m.group(3).lower() + '_'
        else:
            return m.group(3).capitalize()

    ret, nb_repl = re.subn(r'(^)?(_*)([a-zA-Z]+)', rep, value)
    return ret if (nb_repl > 1) else ret[:-1]

edited Nov 29 '10 at 21:53

answered Nov 29 '10 at 12:55

Jocelyn delalande

5,123
3
30
34

Fails to meet the spec! For `get__this_value` it returns `getThisValue` whereas `get_ThisValue` is wanted. – Gareth Rees Nov 29 '10 at 12:58
Fixed again (and please, drop the systematic exclamation mark, this is unpleasant). – Jocelyn delalande Nov 29 '10 at 21:54

score 0 · Answer 14 · answered Nov 29 '10 at 13:36

0

Another regexp solution:

import re

def conv(s):
    """Convert underscore-separated strings to camelCase equivalents.

    >>> conv('get')
    'get'
    >>> conv('_get')
    '_get'
    >>> conv('get_this_value')
    'getThisValue'
    >>> conv('__get__this_value_')
    '_get_ThisValue_'
    >>> conv('_get__this_value__')
    '_get_ThisValue_'
    >>> conv('___get_this_value')
    '_getThisValue'

    """
    # convert case:
    s = re.sub(r'(_*[A-Z])', lambda m: m.group(1).lower(), s.title(), count=1)
    # remove/normalize underscores:
    s = re.sub(r'__+|^_+|_+$', '|', s).replace('_', '').replace('|', '_')
    return s

if __name__ == "__main__":
    import doctest
    doctest.testmod()

It works for your examples, but it might fail for names containting digits - it depends how you would capitalize them.

answered Nov 29 '10 at 13:36

Oben Sonne

9,893
2
40
61

@freegnu: Could you be more precise? – Oben Sonne Nov 30 '10 at 10:41
@'Oben Sonne' Even though the example given by the requester drops consecutive underscores the requested specification says to preserve double and triple underscores. The example given by the requester is probably wrong. – freegnu May 17 '11 at 05:31
@freegnu: Well, you have your specific interpretation of an ambiguous question and based on this you mark my answer as wrong? My answer is a suggestion how to solve the OP's problem and the doctests clearly communicate for what spec it works (*preserve double and triple underlines* interpreted as *compress them to one*). So what's the problem? Please relax and be more communicative in the first place when criticizing answers. – Oben Sonne May 17 '11 at 07:58

score 0 · Answer 15 · answered Nov 30 '10 at 01:59

A slightly modified version:

import re

def underscore_to_camelcase(value):
    first = True
    res = []

    for u,w in re.findall('([_]*)([^_]*)',value):
        if first:
            res.append(u+w)
            first = False
        elif len(w)==0:    # trailing underscores
            res.append(u)
        else:   # trim an underscore and capitalize
            res.append(u[:-1] + w.title())

    return ''.join(res)

score 0 · Answer 16 · answered Jun 08 '18 at 15:09

I know this has already been answered, but I came up with some syntactic sugar that handles a special case that the selected answer does not (words with dunders in them i.e. "my_word__is_____ugly" to "myWordIsUgly"). Obviously this can be broken up into multiple lines but I liked the challenge of getting it on one. I added line breaks for clarity.

def underscore_to_camel(in_string):
return "".join(
    list(
        map(
            lambda index_word:
                index_word[1].lower() if index_word[0] == 0
                else index_word[1][0].upper() + (index_word[1][1:] if len(index_word[1]) > 0 else ""),
            list(enumerate(re.split(re.compile(r"_+"), in_string)
                           )
                 )
        )
    )
)

score 0 · Answer 17 · answered Jul 21 '22 at 10:39

Maybe, pydash works for this purpose (https://pydash.readthedocs.io/en/latest/)

>>> from pydash.strings import snake_case 
>>>> snake_case('needToBeSnakeCased')
'get__this_value'
>>> from pydash.strings import camel_case
>>>camel_case('_get__this_value_')
'getThisValue'

How can I simplify this conversion from underscore to camelcase in Python?

17 Answers17