1

I'm trying to get re.search to find strings that don't have the letter p in them. My regex code returns everything in the list which is what I don't want. I wrote an alternate solution that gives me the exact results that I want, but I want to see if this can be solved with re.search, but I'll also accept another regex solution. I also tried re.findall and that didn't work, and re.match won't work because it looks for the pattern at the beginning of a string.

import re

someList = ['python', 'ppython', 'ython', 'cython', '.python', '.ythop', 'zython', 'cpython', 'www.python.org', 'xyzthon', 'perl', 'javap', 'c++']

# this returns everything from the source list which is what I DON'T want
pattern = re.compile('[^p]')
result = []

for word in someList:
    if pattern.search(word):
        result.append(word)
print '\n', result
''' ['python', 'ppython', 'ython', 'cython', '.python', '.ythop', 'zython', 'cpython', 'www.python.org', 'xyzthon', 'perl', 'javap', 'c++'] '''

# this non regex solution returns the results I want
cnt = 0; no_p = []

for word in someList:
    for letter in word:
        if letter == 'p':
            cnt += 1
            pass
    if cnt == 0:
        no_p.append(word)
    cnt = 0
print '\n', no_p
''' ['ython', 'cython', 'zython', 'xyzthon', 'c++'] '''
Michael Swartz
  • 858
  • 2
  • 15
  • 27

2 Answers2

3

You are almost there. The pattern you are using is looking for at least one letter that is not 'p'. You need a more strict one. Try:

pattern = re.compile('^[^p]*$')
fmdra
  • 101
  • 1
  • 4
2

Your understanding of character-set negation is flawed. The regex [^p] will match any string that has a character other than p in it, which is all of your strings. To "negate" a regex, simply negate the condition in the if statement. So:

import re

someList = ['python', 'ppython', 'ython', 'cython', '.python', '.ythop', 'zython', 'cpython', 'www.python.org', 'xyzthon', 'perl', 'javap', 'c++']

pattern = re.compile('p')
result = []
for word in someList:
    if not pattern.search(word):
        result.append(word)
print result

It is, of course, rather pointless to use a regex to see if a single specific character is in the string. Your second attempt is more apt for this, but it could be coded better:

result = []
for word in someList:
    if 'p' not in word:
        result.append(word)
print result
ooga
  • 15,423
  • 2
  • 20
  • 21
  • 1
    Replacing my original pattern with this `re.compile('^((?!p).)*$')` also works. I found that solution on this site at http://stackoverflow.com/questions/717644/regular-expression-that-doesnt-contain-certain-string?rq=1 – Michael Swartz Apr 29 '14 at 05:07
  • 1
    @MichaelSwartz: That's a more general solution, useful if the "prohibited string" is longer than a single character (or even is an entire regex). For the simple requirement of a string not containing a *single* character, ooga's or fmdra's answer are the best solution. ooga's will possibly be faster - will need to profile that. – Tim Pietzcker Apr 29 '14 at 05:20
  • @ooga --- yes, I see your point. Thanks for clearing that up, now my understanding of character set negation is less flawed. – Michael Swartz Apr 29 '14 at 05:29