How to extract all UPPER from a string? Python

Question

#input
my_string = 'abcdefgABCDEFGHIJKLMNOP'

how would one extract all the UPPER from a string?

#output
my_upper = 'ABCDEFGHIJKLMNOP'

i found a way to do it with a for loop but it did not seem to efficient — O.rka, Apr 08 '13 at 18:35
please do provide it in the question; it is far from obvious what kind of efficiency constraints you are facing — user4815162342, Apr 08 '13 at 18:36
i'm stil getting used to the join function but i knew it would be something along those lines — O.rka, Apr 08 '13 at 18:36
When you say "it did not seem too efficient", what do you mean? You tested it and it was too slow? You suspect there's some quadratic behavior somewhere when it should be linear? Or…? — abarnert, Apr 08 '13 at 18:41
my_upper='' for k in my_string: if k.isupper(): my_upper = my_upper + k is what I had but it needs to go through each element of the string to add it to the string. I didn't think that was the fastest way to do this — O.rka, Apr 08 '13 at 18:50
I'm not sure why I got a -1 from this question when technically the criteria for a good question is one that would help a lot of people in the most simplistic manner. — O.rka, Apr 08 '13 at 18:51
@draconisthe0ry "i found a way to do it with a for loop but it did not seem to efficient" see my answer for efficiency comparisons — Joran Beasley, Apr 08 '13 at 18:55

piokuc · Accepted Answer · 2013-04-08T18:43:47.143

41

Using list comprehension:

>>> s = 'abcdefgABCDEFGHIJKLMNOP'
>>> ''.join([c for c in s if c.isupper()])
'ABCDEFGHIJKLMNOP'

Using generator expression:

>>> ''.join(c for c in s if c.isupper())
'ABCDEFGHIJKLMNOP

You can also do it using regular expressions:

>>> re.sub('[^A-Z]', '', s)
'ABCDEFGHIJKLMNOP'

edited Apr 08 '13 at 18:43

answered Apr 08 '13 at 18:32

piokuc

25,594
11
72
102

herinkc · Answer 2 · 2013-04-08T18:53:43.650

7

import string
s = 'abcdefgABCDEFGHIJKLMNOP'
s.translate(None,string.ascii_lowercase)

string.translate(s, table[, deletechars]) function will delete all characters from the string that are in deletechars, a list of characters. Then, the string will be translated using table (we are not using it in this case).

To remove only the lower case letters, you need to pass string.ascii_lowercase as the list of letters to be deleted.

The table is None because when the table is None, only the character deletion step will be performed.

edited Apr 08 '13 at 18:53

answered Apr 08 '13 at 18:34

herinkc

381
2
13

2

I thought about posting this, but ultimately, it fails in too many cases (what about punctuation, non-printing characters, etc.) – mgilson Apr 08 '13 at 18:35
3

Removing all lowercase is only the same as subtracting all uppercase when the data is nothing but letters. The OP's one sample is all letters, so this _might_ be appropriate—but not without explaining the difference. – abarnert Apr 08 '13 at 18:36
4

this is by far the most time efficient method of all the ones given ... assuming it works for OP use case ... – Joran Beasley Apr 08 '13 at 18:53
If you need to deal with non-letters, but still only with ASCII you can define `ascii_nonuppercase` as, e.g., `''.join(c for c in string.printable if c not in string.ascii_uppercase)`, or just `'0123456789abcdefghijklmnopqrstuvwxyz!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'`, and then use that. If you need to deal with Unicode that's a non-starter, but otherwise, give Joran Beasley's timing, the slight extra complexity might be worth it. – abarnert Apr 08 '13 at 19:15

score 6 · Answer 3 · answered Apr 08 '13 at 18:38

6

Higher order functions to the rescue!

filter(str.isupper, "abcdefgABCDEFGHIJKLMNOP")

EDIT: In case you don't know what filter does: filter takes a function and an iterable, and then applies the function to every element in the iterable. It keeps all of the values that return true and throws out all of the rest. Therefore, this will return "ABCDEFGHIJKLMNOP".

answered Apr 08 '13 at 18:38

hatkirby

850
5
12

3

this is better : `filter(str.isupper,"abvABC")` – Joran Beasley Apr 08 '13 at 18:39
This one also needs a `join`, since the OP wants a string, not a list or a filter iterator. – abarnert Apr 08 '13 at 18:42
1

This does actually return a string. – hatkirby Apr 08 '13 at 18:46

Joran Beasley · Answer 4 · 2013-04-08T18:48:42.247

4

or use regex ... this is an easy answer

import re
print ''.join(re.findall('[A-Z]+',my_string))

just for comparison

In [6]: %timeit filter(str.isupper,my_list)
1000 loops, best of 3: 774 us per loop

In [7]: %timeit ''.join(re.findall('[A-Z]+',my_list))
1000 loops, best of 3: 563 us per loop

In [8]: %timeit re.sub('[^A-Z]', '', my_list)
1000 loops, best of 3: 869 us per loop

In [10]: %timeit ''.join(c for c in my_list if c.isupper())
1000 loops, best of 3: 1.05 ms per loop

so this join plus findall is the fastest method (per ipython %timeit (python 2.6)) , using a 10000 character long identical string

edit: Or not

In [12]: %timeit  my_list.translate(None,string.ascii_lowercase)
10000 loops, best of 3: 51.6 us per loop

edited Apr 08 '13 at 18:48

answered Apr 08 '13 at 18:35

Joran Beasley

110,522
12
160
179

this will return a list, not a string, though – user4815162342 Apr 08 '13 at 18:37
1

You need to add a `join` to make this work. (If you're absolutely sure that all of the uppercase characters are in a single run, you could use `[0]` instead, of course.) – abarnert Apr 08 '13 at 18:37
fixed :P ... just joined the output at the end – Joran Beasley Apr 08 '13 at 18:38
3

I wouldn't make any strong proclamations about efficiency based on a 10% difference without testing multiple Python versions, platforms, etc., and different input data. For example, I get similar results with CPython 2.7.2, but on 3.3.0, the genexp beats the regex by 5%, while with PyPy 1.9.0, the `filter` beats it by 20%. The order-of-magnitude gain of `translate` is more likely to be trustworthy, but even that drops to a 2:1 gain in a quick test with PyPy. – abarnert Apr 08 '13 at 19:21

Finn · Answer 5 · 2013-04-08T19:33:48.220

4

You could use a more functional approach

>>> s = 'abcdefgABCDEFGHIJKLMNOP'
>>> filter(str.isupper, s)
'ABCDEFGHIJKLMNOP'

edited Apr 08 '13 at 19:33

answered Apr 08 '13 at 18:38

Finn

198
9

3

`filter(str.isupper,"abvABC")` lambdas slow down filters ... use builtins when you can :) – Joran Beasley Apr 08 '13 at 18:40
2

You need a `join` here. Otherwise, you're going to return a `list` (2.x) or a filter iterator (3.x), not a string. – abarnert Apr 08 '13 at 18:41
Also, it's a bit misleading to call `filter` "more functional" than a comprehension/genexp, at least without explanation. The language Python borrowed the latter from is Haskell, after all. – abarnert Apr 08 '13 at 19:10

score 0 · Answer 6 · edited May 22 '18 at 20:26

0

here you go:

my_string = 'abcdefgABCDEFGHIJKLMNOP'

cleanChar = ''

for char in my_string:
    if char in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
        cleanChar = cleanChar + char

newChar = cleanChar
print(" {}".format(newChar))

edited May 22 '18 at 20:26

Pollett

596
4
15

answered May 22 '18 at 19:20

user9830600

1
1

score 0 · Answer 7 · edited Jun 12 '20 at 03:26

0

for char in my_string:
     if char in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
         print(char)

edited Jun 12 '20 at 03:26

Stuart Buckingham

1,574
16
25

answered Jun 11 '20 at 21:44

Scott

1

How to extract all UPPER from a string? Python

7 Answers7

Linked

Related