regular expressions emoticons

Question

I have data split into fileids. I am trying to go through the data per fileid and search for emoticons :( and :) as defined by the regex. If an emoticon is found I need to retain the information a) the emoticon was found b) in this fileid. When I run this piece of script and print the emoticon dictionary I get 0 as a value. How is this possible? I am a beginner.

emoticon = 0
for fileid in corpus.fileids():
    m = re.search('^(:\(|:\))+$', fileid)
    if m is not None:
        emoticon +=1

score 1 · Accepted Answer · answered Aug 17 '13 at 00:06

1

It looks to me like your regex is working, and that m should indeed not be None.

>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':):(').group()
':):('
>>> re.search('^(:\(|:\))+$', ':)?:(').group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

However, a few things are questionable to me.

this will only match strings that are 100% emoticons
is fileid really what you're searching?

answered Aug 17 '13 at 00:06

vroomfondel

3,056
1
21
32

I am using the fileid function in NLTK. Each file contains 1 sentence. So I want to check for emoticons in each file id or sentence. Is this not possible? O and I don't want to check for strings that are only containing emoticons, I'll have to change my regex. Thanks! :) – JohnDoe Aug 17 '13 at 00:12
I fixed the mistake I made with the fileid. – JohnDoe Aug 17 '13 at 00:22

regular expressions emoticons

1 Answers1

Linked