13

I call a function that returns code with all kinds of characters ranging from ( to ", and , and numbers.

Is there an elegant way to remove all of these so I end up with nothing but letters?

Joan Venge
  • 315,713
  • 212
  • 479
  • 689
  • Hey Take a look at this link someone a similar Question! http://stackoverflow.com/questions/12851791/removing-numbers-from-string – Kurt Apr 17 '14 at 19:56
  • Why was my comment removed? I saw people like Jon Skeet asking about similar things here which is valid. – Joan Venge Apr 17 '14 at 20:50
  • See also: https://stackoverflow.com/questions/15754587 for *general-purpose* removal (everything not matching a whitelist); https://stackoverflow.com/questions/1450897 for digits only. – Karl Knechtel Aug 01 '22 at 20:24

7 Answers7

32

Given

s = '@#24A-09=wes()&8973o**_##me'  # contains letters 'Awesome'    

You can filter out non-alpha characters with a generator expression:

result = ''.join(c for c in s if c.isalpha())

Or filter with filter:

result = ''.join(filter(str.isalpha, s))    

Or you can substitute non-alpha with blanks using re.sub:

import re
result = re.sub(r'[^A-Za-z]', '', s)
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • Wow very slick. I really like when python has elegant solutions like this. – Joan Venge Apr 17 '14 at 19:58
  • 4
    Note that `isalpha` only counts a-z as letters, not e.g. å, ä, ö, ø, ñ, é or à – leo Apr 19 '16 at 14:44
  • The fastest approach is second one (`filter`). Approximately 2x times than others. First and third variants are almost equal, but `re` slightly slower. – Alex Dec 22 '22 at 08:49
5

A solution using RegExes is quite easy here:

import re
newstring = re.sub(r"[^a-zA-Z]+", "", string)

Where string is your string and newstring is the string without characters that are not alphabetic. What this does is replace every character that is not a letter by an empty string, thereby removing it. Note however that a RegEx may be slightly overkill here.

A more functional approach would be:

newstring = "".join(filter(str.isalpha, string))

Unfortunately you can't just call stron a filterobject to turn it into a string, that would look much nicer...
Going the pythonic way it would be

newstring = "".join(c for c in string if c.isalpha())
0x32e0edfb
  • 707
  • 1
  • 6
  • 18
Cu3PO42
  • 1,403
  • 1
  • 11
  • 19
3

You didn't mention you want only english letters, here's an international solution:

import unicodedata

str = u"hello, ѱϘяԼϷ!"
print ''.join(c for c in str if unicodedata.category(c).startswith('L'))
gog
  • 10,367
  • 2
  • 24
  • 38
1

Here's another one, using string.ascii_letters

>>> import string
>>> "".join(x for x in s if x in string.ascii_letters)

`

msvalkon
  • 11,887
  • 2
  • 42
  • 38
1
>>> import re
>>> string = "';''';;';1123123!@#!@#!#!$!sd         sds2312313~~\"~s__"
>>> re.sub("[\W\d_]", "", string)
'sdsdss'
Sayan Chowdhury
  • 419
  • 4
  • 13
1

Well, I use this for myself in this kind of situations

Sorry, if it's outdated :)

string = "The quick brown fox jumps over the lazy dog!"
alphabet = "abcdefghijklmnopqrstuvwxyz"

def letters_only(source):
    result = ""
    for i in source.lower():
        if i in alphabet:
            result += i
    return result

print(letters_only(string))
0
s = '@#24A-09=wes()&8973o**_##me'

print(filter(str.isalpha, s))

# Awesome

About return value of filter:

filter(function or None, sequence) -> list, tuple, or string
Omid Raha
  • 9,862
  • 1
  • 60
  • 64