Unicode vs ASCII - Issue processing strings with functions in string and re modules

Question

I am using the string and re modules to process text (find striped words in a sentence) to solve a problem on checkIO in Python 2.7. When I run my python script on my computer I receive no error.

text = "My name is ..."

import re, string

init_word_list = re.findall('[A-z0-9]+', text)

word_list = []

for k in init_word_list:
    print type(k), repr(k)
    if str.isdigit(k):
        word_list.append(k)
    else:
        pass

However, I receive the following TypeError, when I run the same code on checkIO.

TypeError: descriptor 'isdigit' requires a 'str' object but received a `'unicode'`

As you may have noticed, I did insert type() and rep() to figure out what python reads my string as at that point. Here is the output:

<type 'unicode'> u'My'

I would like to know, if I am doing something wrong. Also, what are my options to solve this issue? Should I convert from unicode to ASCII before running the str.isdigit() function? Or, should I do the alphabet check with the re module? I will venture a guess that people will point me to the checkIO forums to understand why their program is handling the script differently than python running on my computer, but if someone understands this too.. great. :)

score 0 · Accepted Answer · edited May 23 '17 at 11:43

I did find one fix to the above issue by using encode("ascii", "ignore"), and replacing a portion of my above code with the following:

for k in init_word_list:
    l = k.encode("ascii", "ignore")
    if str.isalpha(l):
        word_list.append(k)
    else:
        pass

By spending some additional time googling, I learned that ascii is a subset of the unicode characters (link). Since checkIO was only feeding me characters that were in the ascii subset, my conversion worked with no issues. I guess that one should be careful when doing this type of conversion though.

Unicode vs ASCII - Issue processing strings with functions in string and re modules

1 Answers1