Python function that works the same as s.upper

Question

So I am trying to create a function that capitalizes all of the letters of a string the same way s.upper would do, but in the format of a function. And I want to try to utilize ord() and chr() but stating that if the character of a string is >90 replace it with the character that is 32 less than the original ore. I feel like I have some of the pieces, but Im not sure how to actually put it together. I know I need a string accumulator, but how to fit them all together is not coming to me. So far I have this:

 def Uppercase(s):
     x = ''
     for ch in s:
     x = -----> confused about what the accumulation would be
     if ch ord() > 91:
         s.replace(ch, chr(ord())-----> not sure that this is possible to implement

You could just write `Uppercase = str.upper` if you want to use it as a function. (If you're doing this as a learning experience, of course, that's not an answer, and I suspect you are, which is why I wrote this as a comment.) — abarnert, May 06 '13 at 21:56

Ashwini Chaudhary · Accepted Answer · 2013-05-06T22:06:49.193

4

If a character's ord() value lies between 97 and 122 (both inclusive) then you can decrease 32 from it to get the corresponding upper case letter.

A one-liner using str.join and list comprehension:

>>> def upper_case(s):
    return "".join([ chr(ord(x)-32) if 97<=ord(x)<=122 else x for x in s ])

>>> upper_case("foo bar")
'FOO BAR'

A more readable version:

>>> def upper_case(s):
    new_strs = []
    for char in s:
        ordi = ord(char)
        if 97 <= ordi <= 122:
            new_strs .append( chr(ordi-32) )
        else:    
            new_strs.append(char)
    return "".join(new_strs)     #join the list using str.join and return
... 
>>> upper_case("foo bar")
'FOO BAR'

>>> from string import ascii_lowercase
>>> upper_case(ascii_lowercase)
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

edited May 06 '13 at 22:06

answered May 06 '13 at 21:32

Ashwini Chaudhary

244,495
58
464
504

2

Although this is a perfectly good use for a generator expression, I suspect that making it a list comprehension will be easier to understand for the OP. – Daniel Roseman May 06 '13 at 21:34
2

This is incredibly unclear what it is doing. Just because you can write it in one line, doesn't mean that you should. – joneshf May 06 '13 at 21:35
The generator expression version for `''.join()` is *not* more memory efficient. `''.join()` *will always* turn the sequence into a list if it is not a list yet before creating the output string, because it needs to pre-allocate a string big enough to hold all of the input strings. Using a list comprehension is going to be faster, but not use any more memory. – Martijn Pieters May 06 '13 at 21:55
@MartijnPieters: I think that's just an artifact of CPython (because the performance benefits of preallocating the string and using the `PySequence_Fast` protocol outweigh the costs of generating the list in the first place), not something guaranteed by the language. – abarnert May 06 '13 at 22:01
@MartijnPieters Thanks, never about that, Raymond Hettinger [confirms it](http://stackoverflow.com/a/9061024/846892). – Ashwini Chaudhary May 06 '13 at 22:03
@MartijnPieters: Looking at [PyPy](https://bitbucket.org/pypy/pypy/src/05c73d13ade55bc4c8ee755e7f6399c524f05dbe/pypy/objspace/std/stringobject.py?at=default#cl-378), it creates a "listview" object, which is similar to using `PySequence_Fast` in CPython, and then in some cases will "rebuild w_list here, because the original w_list might be an iterable which we already consumed", and the JIT will skip some steps if you can get the length in advance, so I'd guess it's probably true there too… but I'd want to test to be sure. – abarnert May 06 '13 at 22:07
@abarnert: I'd expect all implementations to make this optimization. Only if creating a dynamically sized string is cheaper than first calculating the output size can you get away with only looping over the input sequence just once. – Martijn Pieters May 06 '13 at 22:36
@MartijnPieters: It's at least _conceivable_, if not _likely_, that using a Java/.NET StringBuilder, or some similar custom C or Python or RPython thing, could have less overhead (in some cases) than iterating over a sequence twice and creating extra objects for the GC to collect. – abarnert May 06 '13 at 22:52

Martijn Pieters · Answer 2 · 2013-05-06T22:04:30.830

Use a list, then join the individual characters together again:

def Uppercase(s):
    result = []
    for ch in s:
        value = ord(ch)
        if 97 <= value <= 122:
            value -= 32
        result.append(chr(value))

    return ''.join(result)

My version only changes characters with byte values between 97 (a) and 122 (z). str.join() turns a list of strings back into one string, with an optional delimiter text (here left empty).

You can collapse this down into a list comprehension that does the same thing:

def Uppercase(s):
    return ''.join([chr(ord(ch) - 32) if 'a' <= ch <= 'z' else ch for ch in s])

but that might be less easily understood if you are just beginning with Python.

The if statement of the first version has been replaced with a conditional expression; the form true_expression if some_test else false_expression first evaluates some_test, then based on the outcome returns true_expression or false_expression.

Either version results in:

>>> Uppercase('Hello world!')
'HELLO WORLD!'

For the listcomp, you can avoid calling `ord` twice, and I think make the code more readable to boot, by doing `'a' <= ch <= 'z'` (or `ch in string.ascii_lowercase`). It removes the "magic number" quotient of the code from 3 to 1. (You still need to know that each ASCII upper is 32 less than the corresponding lower, but you don't need to know that `a` is `97` and `z` is `122`.) — abarnert, May 06 '13 at 22:03
@abarnert: thanks, I sometimes forget that we can do that with strings. :-) — Martijn Pieters, May 06 '13 at 22:06

Python function that works the same as s.upper

2 Answers2