18

Is there a short way to remove all strings in a list that contains numbers?

For example

my_list = [ 'hello' , 'hi', '4tim', '342' ]

would return

my_list = [ 'hello' , 'hi']
user1506145
  • 5,176
  • 11
  • 46
  • 75

6 Answers6

39

Without regex:

[x for x in my_list if not any(c.isdigit() for c in x)]
eumiro
  • 207,213
  • 34
  • 299
  • 261
7

I find using isalpha() the most elegant, but it will also remove items that contain other non-alphabetic characters:

Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”

my_list = [item for item in my_list if item.isalpha()]
Adam
  • 15,537
  • 2
  • 42
  • 63
  • 1
    He wants to remove strings with numbers, but special characters (spaces, punctuation,…) are probably allowed. – eumiro Apr 18 '13 at 13:46
  • Except it won't work for punctuation – jamylak Apr 18 '13 at 13:48
  • That's correct. I still thought I'd include it because it *will* work for many scenarios. – Adam Apr 18 '13 at 13:49
  • 1
    And written "old style", which works (IMHO) for readability in this case (and if run on 2.x) is, `filter(str.isalpha, my_list)` – Jon Clements Apr 18 '13 at 13:50
  • That's not old style! I would do it that way – jamylak Apr 18 '13 at 13:51
  • 1
    The problem with this is that it also removes characters åäö, but that is exactly what I want to do, remove all non characters – user1506145 Apr 18 '13 at 13:52
  • 1
    @user1506145 - then define it in your question please. – eumiro Apr 18 '13 at 13:52
  • 1
    @user1506145 it will work just fine if you encode them in unicode, i.e. by using the `u` prefix as in `u'åääö'`. In Python 3, all strings are unicode and this is not an issue. – Adam Apr 18 '13 at 13:57
4

I'd use a regex:

import re
my_list = [s for s in my_list if not re.search(r'\d',s)]

In terms of timing, using a regex is significantly faster on your sample data than the isdigit solution. Admittedly, it's slower than isalpha, but the behavior is slightly different with punctuation, whitespace, etc. Since the problem doesn't specify what should happen with those strings, it's not clear which is the best solution.

import re

my_list = [ 'hello' , 'hi', '4tim', '342' 'adn322' ]
def isalpha(mylist):
    return [item for item in mylist if item.isalpha()]

def fisalpha(mylist):
    return filter(str.isalpha,mylist)

def regex(mylist,myregex = re.compile(r'\d')):
    return [s for s in mylist if not myregex.search(s)]

def isdigit(mylist):
    return [x for x in mylist if not any(c.isdigit() for c in x)]

import timeit
for func in ('isalpha','fisalpha','regex','isdigit'):
    print func,timeit.timeit(func+'(my_list)','from __main__ import my_list,'+func)

Here are my results:

isalpha 1.80665302277
fisalpha 2.09064006805
regex 2.98224401474
isdigit 8.0824341774
mgilson
  • 300,191
  • 65
  • 633
  • 696
1

Try:

import re
my_list = [x for x in my_list if re.match("^[A-Za-z_-]*$", x)]
Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
0

Sure, use the string builtin for digits, and test the existence of them. We'll get a little fancy and just test for truthiness in the list comprehension; if it's returned anything there's digits in the string.

So:

out_list = []
for item in my_list:
    if not [ char for char in item if char in string.digits ]:
        out_list.append(item)        
GoingTharn
  • 1,123
  • 1
  • 11
  • 19
0

And yet another slight variation:

>>> import re
>>> filter(re.compile('(?i)[a-z]').match, my_list)
['hello', 'hi']

And put the characters that are valid in your re (such as spaces/punctuation/other)

Jon Clements
  • 138,671
  • 33
  • 247
  • 280