21
#input
my_string = 'abcdefgABCDEFGHIJKLMNOP'

how would one extract all the UPPER from a string?

#output
my_upper = 'ABCDEFGHIJKLMNOP'
O.rka
  • 29,847
  • 68
  • 194
  • 309
  • can you show us what you have tried so far? – user4815162342 Apr 08 '13 at 18:31
  • i found a way to do it with a for loop but it did not seem to efficient – O.rka Apr 08 '13 at 18:35
  • please do provide it in the question; it is far from obvious what kind of efficiency constraints you are facing – user4815162342 Apr 08 '13 at 18:36
  • i'm stil getting used to the join function but i knew it would be something along those lines – O.rka Apr 08 '13 at 18:36
  • 1
    When you say "it did not seem too efficient", what do you mean? You tested it and it was too slow? You suspect there's some quadratic behavior somewhere when it should be linear? Or…? – abarnert Apr 08 '13 at 18:41
  • my_upper='' for k in my_string: if k.isupper(): my_upper = my_upper + k is what I had but it needs to go through each element of the string to add it to the string. I didn't think that was the fastest way to do this – O.rka Apr 08 '13 at 18:50
  • I'm not sure why I got a -1 from this question when technically the criteria for a good question is one that would help a lot of people in the most simplistic manner. – O.rka Apr 08 '13 at 18:51
  • @draconisthe0ry "i found a way to do it with a for loop but it did not seem to efficient" see my answer for efficiency comparisons – Joran Beasley Apr 08 '13 at 18:55

7 Answers7

41

Using list comprehension:

>>> s = 'abcdefgABCDEFGHIJKLMNOP'
>>> ''.join([c for c in s if c.isupper()])
'ABCDEFGHIJKLMNOP'

Using generator expression:

>>> ''.join(c for c in s if c.isupper())
'ABCDEFGHIJKLMNOP

You can also do it using regular expressions:

>>> re.sub('[^A-Z]', '', s)
'ABCDEFGHIJKLMNOP'
piokuc
  • 25,594
  • 11
  • 72
  • 102
7
import string
s = 'abcdefgABCDEFGHIJKLMNOP'
s.translate(None,string.ascii_lowercase)

string.translate(s, table[, deletechars]) function will delete all characters from the string that are in deletechars, a list of characters. Then, the string will be translated using table (we are not using it in this case).

To remove only the lower case letters, you need to pass string.ascii_lowercase as the list of letters to be deleted.

The table is None because when the table is None, only the character deletion step will be performed.

herinkc
  • 381
  • 2
  • 13
  • 2
    I thought about posting this, but ultimately, it fails in too many cases (what about punctuation, non-printing characters, etc.) – mgilson Apr 08 '13 at 18:35
  • 3
    Removing all lowercase is only the same as subtracting all uppercase when the data is nothing but letters. The OP's one sample is all letters, so this _might_ be appropriate—but not without explaining the difference. – abarnert Apr 08 '13 at 18:36
  • 4
    this is by far the most time efficient method of all the ones given ... assuming it works for OP use case ... – Joran Beasley Apr 08 '13 at 18:53
  • If you need to deal with non-letters, but still only with ASCII you can define `ascii_nonuppercase` as, e.g., `''.join(c for c in string.printable if c not in string.ascii_uppercase)`, or just `'0123456789abcdefghijklmnopqrstuvwxyz!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'`, and then use that. If you need to deal with Unicode that's a non-starter, but otherwise, give Joran Beasley's timing, the slight extra complexity might be worth it. – abarnert Apr 08 '13 at 19:15
6

Higher order functions to the rescue!

filter(str.isupper, "abcdefgABCDEFGHIJKLMNOP")

EDIT: In case you don't know what filter does: filter takes a function and an iterable, and then applies the function to every element in the iterable. It keeps all of the values that return true and throws out all of the rest. Therefore, this will return "ABCDEFGHIJKLMNOP".

hatkirby
  • 850
  • 5
  • 12
4

or use regex ... this is an easy answer

import re
print ''.join(re.findall('[A-Z]+',my_string))

just for comparison

In [6]: %timeit filter(str.isupper,my_list)
1000 loops, best of 3: 774 us per loop

In [7]: %timeit ''.join(re.findall('[A-Z]+',my_list))
1000 loops, best of 3: 563 us per loop

In [8]: %timeit re.sub('[^A-Z]', '', my_list)
1000 loops, best of 3: 869 us per loop

In [10]: %timeit ''.join(c for c in my_list if c.isupper())
1000 loops, best of 3: 1.05 ms per loop

so this join plus findall is the fastest method (per ipython %timeit (python 2.6)) , using a 10000 character long identical string

edit: Or not

In [12]: %timeit  my_list.translate(None,string.ascii_lowercase)
10000 loops, best of 3: 51.6 us per loop
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • this will return a list, not a string, though – user4815162342 Apr 08 '13 at 18:37
  • 1
    You need to add a `join` to make this work. (If you're absolutely sure that all of the uppercase characters are in a single run, you could use `[0]` instead, of course.) – abarnert Apr 08 '13 at 18:37
  • fixed :P ... just joined the output at the end – Joran Beasley Apr 08 '13 at 18:38
  • 3
    I wouldn't make any strong proclamations about efficiency based on a 10% difference without testing multiple Python versions, platforms, etc., and different input data. For example, I get similar results with CPython 2.7.2, but on 3.3.0, the genexp beats the regex by 5%, while with PyPy 1.9.0, the `filter` beats it by 20%. The order-of-magnitude gain of `translate` is more likely to be trustworthy, but even that drops to a 2:1 gain in a quick test with PyPy. – abarnert Apr 08 '13 at 19:21
4

You could use a more functional approach

>>> s = 'abcdefgABCDEFGHIJKLMNOP'
>>> filter(str.isupper, s)
'ABCDEFGHIJKLMNOP'
Finn
  • 198
  • 9
  • 3
    `filter(str.isupper,"abvABC")` lambdas slow down filters ... use builtins when you can :) – Joran Beasley Apr 08 '13 at 18:40
  • 2
    You need a `join` here. Otherwise, you're going to return a `list` (2.x) or a filter iterator (3.x), not a string. – abarnert Apr 08 '13 at 18:41
  • Also, it's a bit misleading to call `filter` "more functional" than a comprehension/genexp, at least without explanation. The language Python borrowed the latter from is Haskell, after all. – abarnert Apr 08 '13 at 19:10
0

here you go:

my_string = 'abcdefgABCDEFGHIJKLMNOP'

cleanChar = ''

for char in my_string:
    if char in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
        cleanChar = cleanChar + char

newChar = cleanChar
print(" {}".format(newChar))
Pollett
  • 596
  • 4
  • 15
0
for char in my_string:
     if char in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
         print(char)
Stuart Buckingham
  • 1,574
  • 16
  • 25
Scott
  • 1