8

I would like to have a regex pattern to match smileys ":)" ,":(" .Also it should capture repeated smileys like ":) :)" , ":) :(" but filter out invalid syntax like ":( (" .

I have this with me, but it matches ":( ("

bool( re.match("(:\()",str) ) 

I maybe missing something obvious here, and I'd like some help for this seemingly simple task.

coding_pleasures
  • 859
  • 1
  • 9
  • 19

4 Answers4

9

I think it finally "clicked" exactly what you're asking about here. Take a look at the below:

import re

smiley_pattern = '^(:\(|:\))+$' # matches only the smileys ":)" and ":("

def test_match(s):
    print 'Value: %s; Result: %s' % (
        s,
        'Matches!' if re.match(smiley_pattern, s) else 'Doesn\'t match.'
    )

should_match = [
    ':)',   # Single smile
    ':(',   # Single frown
    ':):)', # Two smiles
    ':(:(', # Two frowns
    ':):(', # Mix of a smile and a frown
]
should_not_match = [
    '',         # Empty string
    ':(foo',    # Extraneous characters appended
    'foo:(',    # Extraneous characters prepended
    ':( :(',    # Space between frowns
    ':( (',     # Extraneous characters and space appended
    ':(('       # Extraneous duplicate of final character appended
]

print('The following should all match:')
for x in should_match: test_match(x);

print('')   # Newline for output clarity

print('The following should all not match:')
for x in should_not_match: test_match(x);

The problem with your original code is that your regex is wrong: (:\(). Let's break it down.

The outside parentheses are a "grouping". They're what you'd reference if you were going to do a string replacement, and are used to apply regex operators on groups of characters at once. So, you're really saying:

  • ( begin a group
    • :\( ... do regex stuff ...
  • ')' end the group

The : isn't a regex reserved character, so it's just a colon. The \ is, and it means "the following character is literal, not a regex operator". This is called an "escape sequence". Fully parsed into English, your regex says

  • ( begin a group
    • : a colon character
    • \( a left parenthesis character
  • ) end the group

The regex I used is slightly more complex, but not bad. Let's break it down: ^(:\(|:\))+$.

^ and $ mean "the beginning of the line" and "the end of the line" respectively. Now we have ...

  • ^ beginning of line
    • (:\(|:\))+ ... do regex stuff ...
  • $ end of line

... so it only matches things that comprise the entire line, not simply occur in the middle of the string.

We know that ( and ) denote a grouping. + means "one of more of these". Now we have:

  • ^ beginning of line
  • ( start a group
    • :\(|:\) ... do regex stuff ...
  • ) end the group
  • + match one or more of this
  • $ end of line

Finally, there's the | (pipe) operator. It means "or". So, applying what we know from above about escaping characters, we're ready to complete the translation:

  • ^ beginning of line
  • ( start a group
    • : a colon character
    • \( a left parenthesis character
  • | or
    • : a colon character
    • \) a right parenthesis character
  • ) end the group
  • + match one or more of this
  • $ end of line

I hope this helps. If not, let me know and I'll be happy to edit my answer with a reply.

Lyndsy Simon
  • 5,208
  • 1
  • 17
  • 21
3

Maybe something like:

re.match('[:;][)(](?![)(])', str)
woemler
  • 7,089
  • 7
  • 48
  • 67
  • urm..could you explain what exactly does it do? – coding_pleasures Jan 28 '13 at 21:11
  • This regex will match either a `;` or a `:`, followed by either a `)` or `(`, but only when it is then NOT followed by another `)` or `(`. This is probably not the perfect solution, but is at least another way to look at the problem. – woemler Jan 28 '13 at 21:15
2

Try (?::|;|=)(?:-)?(?:\)|\(|D|P). Haven't tested it extensively, but does seem to match the right ones and not more...

In [15]: import re

In [16]: s = "Just: to :)) =) test :(:-(( ():: :):) :(:( :P ;)!"

In [17]: re.findall(r'(?::|;|=)(?:-)?(?:\)|\(|D|P)',s)
Out[17]: [':)', '=)', ':(', ':-(', ':)', ':)', ':(', ':(', ':P', ';)']
root
  • 76,608
  • 25
  • 108
  • 120
  • thank you..whats the meaning of (? ..)? I read the documentation, but couldn't understand. I'd be happy if you could explain the pattern a lil bit.. – coding_pleasures Jan 28 '13 at 21:37
0

I got the answer I was looking for from the comments and answers posted here.

re.match("^(:[)(])*$",str)

Thanks to all.

coding_pleasures
  • 859
  • 1
  • 9
  • 19
  • This regex will only work if `str` starts with `:` and only contains repetitions of `:)` and `:(` all the way to the end. Are you sure this is what you are looking for? – woemler Jan 28 '13 at 22:08
  • Yes, that's exactly what I was looking for. I'm sorry for not being able to post the question correctly enough. And thanks for answering. – coding_pleasures Jan 28 '13 at 22:20
  • Hmm... Are you *sure* you're sure? :) The following string would match this regex: `:):):):):(:(:):(:)`. That doesn't seem useful from this side of the monitor. – Lyndsy Simon Jan 28 '13 at 23:01
  • 1
    Also, an empty string matches. `*` Means "zero or more occurrences". `+` Means "one or more occurrences," and `?` means "Zero or one occurrences." – Lyndsy Simon Jan 28 '13 at 23:03
  • Yes, I agree it doesn't make sense generally. But I was using it to solve a programming contest problem. – coding_pleasures Feb 03 '13 at 16:55