I call a function that returns code with all kinds of characters ranging from ( to ", and , and numbers.
Is there an elegant way to remove all of these so I end up with nothing but letters?
I call a function that returns code with all kinds of characters ranging from ( to ", and , and numbers.
Is there an elegant way to remove all of these so I end up with nothing but letters?
Given
s = '@#24A-09=wes()&8973o**_##me' # contains letters 'Awesome'
You can filter out non-alpha characters with a generator expression:
result = ''.join(c for c in s if c.isalpha())
Or filter with filter
:
result = ''.join(filter(str.isalpha, s))
Or you can substitute non-alpha with blanks using re.sub
:
import re
result = re.sub(r'[^A-Za-z]', '', s)
A solution using RegExes is quite easy here:
import re
newstring = re.sub(r"[^a-zA-Z]+", "", string)
Where string
is your string and newstring
is the string without characters that are not alphabetic. What this does is replace every character that is not a letter by an empty string, thereby removing it. Note however that a RegEx may be slightly overkill here.
A more functional approach would be:
newstring = "".join(filter(str.isalpha, string))
Unfortunately you can't just call str
on a filter
object to turn it into a string, that would look much nicer...
Going the pythonic way it would be
newstring = "".join(c for c in string if c.isalpha())
You didn't mention you want only english letters, here's an international solution:
import unicodedata
str = u"hello, ѱϘяԼϷ!"
print ''.join(c for c in str if unicodedata.category(c).startswith('L'))
Here's another one, using string.ascii_letters
>>> import string
>>> "".join(x for x in s if x in string.ascii_letters)
`
>>> import re
>>> string = "';''';;';1123123!@#!@#!#!$!sd sds2312313~~\"~s__"
>>> re.sub("[\W\d_]", "", string)
'sdsdss'
Well, I use this for myself in this kind of situations
Sorry, if it's outdated :)
string = "The quick brown fox jumps over the lazy dog!"
alphabet = "abcdefghijklmnopqrstuvwxyz"
def letters_only(source):
result = ""
for i in source.lower():
if i in alphabet:
result += i
return result
print(letters_only(string))
s = '@#24A-09=wes()&8973o**_##me'
print(filter(str.isalpha, s))
# Awesome
About return value of filter
:
filter(function or None, sequence) -> list, tuple, or string