8

What's the easiest way to check if a string only contains certain specified characters in Python? (Without using RegEx or anything, of course)

Specifically, I have a list of stings, and I want to filter out all of them except the words that are ONLY made up of ANY of the letters in another string. For example, filtering ['aba', 'acba', 'caz'] though 'abc' should give ['aba', 'acba']. (z not in abc)

Just like only keeping the items that can be made using the given letters.

user1251007
  • 15,891
  • 14
  • 50
  • 76
Jollywatt
  • 1,382
  • 2
  • 12
  • 31

7 Answers7

13

Assuming the discrepancy in your example is a typo, then this should work:

my_list = ['aba', 'acba', 'caz']
result = [s for s in my_list if not s.strip('abc')]

results in ['aba', 'acba']. string.strip(characters) will return an empty string if the string to be stripped contains nothing but characters in the input. Order of the characters should not matter.

Andrew Gorcester
  • 19,595
  • 7
  • 57
  • 73
8

You can make use of sets:

>>> l = ['aba', 'acba', 'caz']
>>> s = set('abc')
>>> [item for item in l if not set(item).difference(s)]
['aba', 'acba']
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • @Bakuriu +1 Yup - where you've got a set already, use an appropriate method on that instead of converting another item to set and providing it the existing set :) – Jon Clements Sep 09 '13 at 09:42
  • Uhm. It just occurred to me that `symmetric_difference` wont do. Because it matches if and only if `item` contains *exactly* the same elements of `s`(`set('abc').symmetric_difference('aba') -> {'c'}`). While we want `item` to be a `subset` of the elements, so `s.issuperset(item)` should do. :s – Bakuriu Sep 09 '13 at 09:46
  • @Bakuriu your comment about `frozenset` was correct, tested and seen that `frozenset` is slower here. Thanks. – alecxe Sep 09 '13 at 09:51
  • @alecxe I believe CPython has mostly the same implementation for `set` and `frozenset`s(e.g. there are common functions to create a new `set`, update it etc see the `Objects/setobject.c` file in the sources). `frozenset`s simply does not provide the methods that have side-effects. I removed that comment since the rest was wrong. – Bakuriu Sep 09 '13 at 09:56
6

Assuming you only want the strings in your list which have only the characters in your search string, you can easily perform

>>> hay = ['aba', 'acba', 'caz']
>>> needle = set('abc')
>>> [h for h in hay if not set(h) - needle]
['aba', 'acba']

If you wan't to avoid sets, you can also do the same using str.translate. In this case, you are removing all characters which are in your search string.

>>> needle = 'abc'
>>> [h for h in hay if not h.translate(None,needle)]
['aba', 'acba']
Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Abhijit
  • 62,056
  • 18
  • 131
  • 204
4

Something like this:

strings = ['aba', 'acba', 'caz']
given = "abc"
filter(lambda string: all(char in given for char in string), strings)
Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Bleeding Fingers
  • 6,993
  • 7
  • 46
  • 74
1

The question is somewhat ambiguous about re-using letters from the base string. Or if there should or should not be repeats, or missing letters allowed. This solution addresses that with a function including a reuse parameter:

from collections import Counter

def anagram_filter(data, base, reuse=True):
    if reuse: # all characters in objects in data are in base, count ignored
        base = set(base)
        return [d for d in data if not set(d).difference(base)]
    r = []
    cb = Counter(base)
    for d in data:
        for k, v in Counter(d).iteritems():
            if (k not in cb.keys()) or (v > cb[k]):
                break
        else:
            r.append(d)
    return r

Usage:

>>> anagram_filter(['aba', 'acba', 'caz'], 'abc')
['aba', 'acba']
>>> anagram_filter(['aba', 'acba', 'caz'], 'abc', False)
[]
>>> anagram_filter(['aba', 'cba', 'caz'], 'abc', False)
['cba']
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
0

Below is the code:

a = ['aba', 'acba', 'caz']
needle = 'abc'

def onlyNeedle(word):
    for letter in word:
        if letter not in needle:
            return False

    return True

a = filter(onlyNeedle, a)

print a
Snowwolf
  • 138
  • 2
  • 11
0

I will assume your reluctance for regexp is not really an issue :

strings = ['aba', 'acba', 'caz']
given = "abc"
filter(lambda value: re.match("^[" + given + "]$", value), strings)
njzk2
  • 38,969
  • 7
  • 69
  • 107
  • `re.escape(given)` would be safer. – Ashwini Chaudhary Sep 09 '13 at 09:31
  • 1
    @Paco : the `of course` part could indicate a prejudice of inherent complexity. I think this prejudice should be corrected. Hence my assumption. I could be wrong, of course, but I try to show that `RegEx or anything` are simple and friendly. – njzk2 Sep 09 '13 at 09:38