0

I have data split into fileids. I am trying to go through the data per fileid and search for emoticons :( and :) as defined by the regex. If an emoticon is found I need to retain the information a) the emoticon was found b) in this fileid. When I run this piece of script and print the emoticon dictionary I get 0 as a value. How is this possible? I am a beginner.

emoticon = 0
for fileid in corpus.fileids():
    m = re.search('^(:\(|:\))+$', fileid)
    if m is not None:
        emoticon +=1
Joren
  • 3,068
  • 25
  • 44
JohnDoe
  • 91
  • 2
  • 9

1 Answers1

1

It looks to me like your regex is working, and that m should indeed not be None.

>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':)').group()
':)'
>>> re.search('^(:\(|:\))+$', ':):(').group()
':):('
>>> re.search('^(:\(|:\))+$', ':)?:(').group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

However, a few things are questionable to me.

  • this will only match strings that are 100% emoticons
  • is fileid really what you're searching?
vroomfondel
  • 3,056
  • 1
  • 21
  • 32
  • I am using the fileid function in NLTK. Each file contains 1 sentence. So I want to check for emoticons in each file id or sentence. Is this not possible? O and I don't want to check for strings that are only containing emoticons, I'll have to change my regex. Thanks! :) – JohnDoe Aug 17 '13 at 00:12
  • I fixed the mistake I made with the fileid. – JohnDoe Aug 17 '13 at 00:22