Test if string ONLY contains given characters

Question

What's the easiest way to check if a string only contains certain specified characters in Python? (Without using RegEx or anything, of course)

Specifically, I have a list of stings, and I want to filter out all of them except the words that are ONLY made up of ANY of the letters in another string. For example, filtering ['aba', 'acba', 'caz'] though 'abc' should give ['aba', 'acba']. (z not in abc)

Just like only keeping the items that can be made using the given letters.

And what did you try to achieve it? Given the condition of no RegEx it looks very like a homework assignment. — Tymoteusz Paul, Sep 09 '13 at 09:15
what's wrong with regex ? that would be a quite trivial line... — njzk2, Sep 09 '13 at 09:22
@Joseph What do you mean? The second item in the list is `'acba'` and in the expected output it is reduced to just `'acb'`? — Ashwini Chaudhary, Sep 09 '13 at 09:23
Yes, I want it to return `acd` or `acba`. And no, this isn't actually homework, I'm writing a program that counts how many English words that can be made from constrained characters. But yes, I am a noob. — Jollywatt, Sep 09 '13 at 09:25

Andrew Gorcester · Accepted Answer · 2014-05-13T20:32:02.540

13

Assuming the discrepancy in your example is a typo, then this should work:

my_list = ['aba', 'acba', 'caz']
result = [s for s in my_list if not s.strip('abc')]

results in ['aba', 'acba']. string.strip(characters) will return an empty string if the string to be stripped contains nothing but characters in the input. Order of the characters should not matter.

edited May 13 '14 at 20:32

answered Sep 09 '13 at 09:22

Andrew Gorcester

19,595
7
57
73

alecxe · Answer 2 · 2013-09-09T09:50:05.750

8

You can make use of sets:

>>> l = ['aba', 'acba', 'caz']
>>> s = set('abc')
>>> [item for item in l if not set(item).difference(s)]
['aba', 'acba']

edited Sep 09 '13 at 09:50

answered Sep 09 '13 at 09:19

alecxe

462,703
120
1,088
1,195

@Bakuriu +1 Yup - where you've got a set already, use an appropriate method on that instead of converting another item to set and providing it the existing set :) – Jon Clements Sep 09 '13 at 09:42
Uhm. It just occurred to me that `symmetric_difference` wont do. Because it matches if and only if `item` contains *exactly* the same elements of `s`(`set('abc').symmetric_difference('aba') -> {'c'}`). While we want `item` to be a `subset` of the elements, so `s.issuperset(item)` should do. :s – Bakuriu Sep 09 '13 at 09:46
@Bakuriu your comment about `frozenset` was correct, tested and seen that `frozenset` is slower here. Thanks. – alecxe Sep 09 '13 at 09:51
@alecxe I believe CPython has mostly the same implementation for `set` and `frozenset`s(e.g. there are common functions to create a new `set`, update it etc see the `Objects/setobject.c` file in the sources). `frozenset`s simply does not provide the methods that have side-effects. I removed that comment since the rest was wrong. – Bakuriu Sep 09 '13 at 09:56

score 6 · Answer 3 · edited May 13 '14 at 20:15

Assuming you only want the strings in your list which have only the characters in your search string, you can easily perform

>>> hay = ['aba', 'acba', 'caz']
>>> needle = set('abc')
>>> [h for h in hay if not set(h) - needle]
['aba', 'acba']

If you wan't to avoid sets, you can also do the same using str.translate. In this case, you are removing all characters which are in your search string.

>>> needle = 'abc'
>>> [h for h in hay if not h.translate(None,needle)]
['aba', 'acba']

score 4 · Answer 4 · edited May 13 '14 at 20:19

4

Something like this:

strings = ['aba', 'acba', 'caz']
given = "abc"
filter(lambda string: all(char in given for char in string), strings)

edited May 13 '14 at 20:19

Cristian Ciupitu

20,270
7
50
76

answered Sep 09 '13 at 09:19

Bleeding Fingers

6,993
7
46
74

score 1 · Answer 5 · answered Sep 09 '13 at 09:37

The question is somewhat ambiguous about re-using letters from the base string. Or if there should or should not be repeats, or missing letters allowed. This solution addresses that with a function including a reuse parameter:

from collections import Counter

def anagram_filter(data, base, reuse=True):
    if reuse: # all characters in objects in data are in base, count ignored
        base = set(base)
        return [d for d in data if not set(d).difference(base)]
    r = []
    cb = Counter(base)
    for d in data:
        for k, v in Counter(d).iteritems():
            if (k not in cb.keys()) or (v > cb[k]):
                break
        else:
            r.append(d)
    return r

Usage:

>>> anagram_filter(['aba', 'acba', 'caz'], 'abc')
['aba', 'acba']
>>> anagram_filter(['aba', 'acba', 'caz'], 'abc', False)
[]
>>> anagram_filter(['aba', 'cba', 'caz'], 'abc', False)
['cba']

score 0 · Answer 6 · answered Sep 09 '13 at 09:23

0

Below is the code:

a = ['aba', 'acba', 'caz']
needle = 'abc'

def onlyNeedle(word):
    for letter in word:
        if letter not in needle:
            return False

    return True

a = filter(onlyNeedle, a)

print a

answered Sep 09 '13 at 09:23

Snowwolf

138
2
11

score 0 · Answer 7 · answered Sep 09 '13 at 09:29

0

I will assume your reluctance for regexp is not really an issue :

strings = ['aba', 'acba', 'caz']
given = "abc"
filter(lambda value: re.match("^[" + given + "]$", value), strings)

answered Sep 09 '13 at 09:29

njzk2

38,969
7
69
107

`re.escape(given)` would be safer. – Ashwini Chaudhary Sep 09 '13 at 09:31
1

@Paco : the `of course` part could indicate a prejudice of inherent complexity. I think this prejudice should be corrected. Hence my assumption. I could be wrong, of course, but I try to show that `RegEx or anything` are simple and friendly. – njzk2 Sep 09 '13 at 09:38

Test if string ONLY contains given characters

7 Answers7

Linked

Related