0

This code is supposed to get a string value from a an excel file. The value is how ever not being recognized as a string. How can I get query as a string? str(string) doesn't seem to work.

def main():
    file_location = "/Users/ronald/Desktop/Twitter/TwitterData.xlsx" 
    workbook = xlrd.open_workbook(file_location) #open work book
    worksheet = workbook.sheet_by_index(0)
    num_rows = worksheet.nrows - 1
    num_cells = worksheet.ncols - 1
    curr_row = 0
    curr_cell = 3
    count = 0
    string = 'tweet'
    tweets = []
    while curr_row < num_rows:
        curr_row += 1
        tweet = worksheet.cell_value(curr_row, curr_cell)
        tweet.encode('ascii', 'ignore')
        #print tweet
        query = str(tweet)
        if (isinstance(query, str)):
            print "it is a string"
        else:
            print "it is not a string"

This is the error i keep getting.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 102-104: ordinal not in range(128)

user3078335
  • 781
  • 4
  • 13
  • 24
  • Is your above code printing 'it is not a string' ? Highly unlikely. – Anand S Kumar Aug 06 '15 at 18:54
  • Yep. It was, until I added this 'query = str(tweet)' and then I got the error above. @AnandSKumar – user3078335 Aug 06 '15 at 18:55
  • Why do you want to encode using `ascii` ? – Anand S Kumar Aug 06 '15 at 19:01
  • i googled the problem, and a solution I found was to try it that way. It didn't work. All I am trying to do is trying to convert it to a string. – user3078335 Aug 06 '15 at 19:03
  • Don't do that, instead try `print(type(tweet))` and `print(repr(tweet))` . And check what its type is. – Anand S Kumar Aug 06 '15 at 19:05
  • This is the output. ` u"I'll take Taylor Swift serious when she grows those horrible bangs out #terrible #youlook12 #Grammys #opener"` – user3078335 Aug 06 '15 at 19:10
  • Why are you using `str` instead of `unicode`, and why are you checking `isinstance` against `str` instead of `basestring`? – Two-Bit Alchemist Aug 06 '15 at 19:10
  • `` -- so it's a string... https://stackoverflow.com/questions/18034272/python-str-vs-unicode-types – Two-Bit Alchemist Aug 06 '15 at 19:11
  • @Two-BitAlchemist i am still new to python. I didn't i could do that. I am still learning as I go along. Thanks. – user3078335 Aug 06 '15 at 19:13
  • @Two-BitAlchemist I am trying to get the tweet(which is supposed to be a string) and then pass it as a parameter to a function. But the function wasn't seeing the tweet as a string. That's my main issue. This is to check if the tweet was actually a string. – user3078335 Aug 06 '15 at 19:20
  • @user3078335 Read the question I linked. There are two distinct types in Python (`str` and `unicode` in Python 2, and `bytes` and `str` in Python 3) that are both string types. The tweet is unicode (which you almost certainly want) and is therefore a string. Your test is invalid because it only tests for one type. – Two-Bit Alchemist Aug 06 '15 at 19:21
  • Thanks!! I figured it out. Post the last comment as an answer so I can mark it as answer. – user3078335 Aug 06 '15 at 19:43

1 Answers1

1

There are two distinct types in Python that both represent strings in different ways.

  1. str or bytes: This is the default in Python 2 (hence str), called bytes in Python 3. It represents a string as a series of bytes, which doesn't work very well for unicode, because each character is not necessarily one byte as in ASCII and some other encodings.

  2. unicode or str: This is the default in Python 3. Unicode handles characters with accents and international characters, so especially when dealing with something like Twitter, that's what you want. In Python 2, this is also what causes some strings to have the little u'' prefix.

Your "is this a string?" test consists of isinstance(s, str), which only tests for the first type and ignores the other. Instead, you can test against basestring -- isinstance(s, basestring) -- as it is the parent of both str and unicode. This properly answers the question of "is this a string?" for Python 2, and it is why you were getting misleading results.

Note that if you ever migrate to Python 3, basestring does not exist. This is a Python 2 test only.

Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82