0

So, I have some tweets with some special characters and shapes. I am trying to find a word in those tweets by converting them to lower case. The function throws an "AttributeError" when it encounters those special characters and hence, I want to change my function in a way that it skips those records and processes others.

Can I add exception to "AttributeError" in python. I want it to act more like an "iferror resume next"/Error handling statement.

I am currently using:-

def word_in_text(word, text):
try:
    print text
    word = word.lower()
    text = text.lower()
    match = re.search(word, text)
    if match:
        return True
    else:
        return False
except(AttributeError, Exception) as e:
    continue

error post using @galah92 recommendations :-

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 2220, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1088, in pandas.lib.map_infer (pandas\lib.c:63043)
  File "<input>", line 1, in <lambda>
  File "<input>", line 3, in word_in_text
  File "C:\Python27\lib\re.py", line 146, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or buffer

I am new to Python and self learning it. Any help will be really appreciated.

1 Answers1

0

You can use re.IGNORECASE flag when you search().
That way you don't need to deal with lower() or exceptions.

def word_in_text(word, text):
    print text
    if re.search(word, text, re.IGNORECASE):
        return True
    else:
        return False

As an example, if I run:

from __future__ import unicode_literals # see edit notes
import re

text = "CANCION! You &amp"
word = "you"

def word_in_text(word, text):
    print(text)
    if re.search(word, text, re.IGNORECASE):
        return True
    else:
        return False

print(word_in_text(word, text))

The output is:

CANCION! You &amp
True

EDIT

For Python 2, you should add from __future__ import unicode_literals at the top of your script to make sure you encode everything to UTF-8.
You can read more about it here.

Community
  • 1
  • 1
galah92
  • 3,621
  • 2
  • 29
  • 55
  • for me it says:- **text = "CANCION! You &amp" :reference to invalid character number: line 1, column 113** – Gagan Oberoi Sep 07 '16 at 22:19
  • I tried but it still gives the same error. I am using Eclipse for python, do you think that can be the issue? – Gagan Oberoi Sep 07 '16 at 22:52
  • Unlikely. Can you edit you question and add a full log of the error? – galah92 Sep 07 '16 at 22:54
  • You've got a `TypeError` (last line in log). Make sure `text` and `word` is of type `string`. You can Try `print type(text)` and `print type(text)` before the `search()` line. – galah92 Sep 08 '16 at 03:57
  • Thanks for your help. It shows the type as Unicode. I have added the **from __future__ import unicode_literals**. As a way around I tried to strip the text of Unicode characters but it gives the following error **UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-19: ordinal not in range(128)**. Do you have any suggestions? I have all the tweets in a data frame using pandas – Gagan Oberoi Sep 08 '16 at 15:59
  • Not particularly, but I believe your answer is right [here](http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) or [here](http://stackoverflow.com/questions/31137552/unicodeencodeerror-ascii-codec-cant-encode-character-at-special-name). – galah92 Sep 08 '16 at 16:15