-3

I want to select those sentences which contains some conjunctions mentioned.But I am getting an error as:

Traceback (most recent call last):
  File "positive_process3.py", line 14, in <module>
    if word in text:
TypeError: 'str' does not support the buffer interface.

My code is:

import xlrd
from xlrd import open_workbook
import xlwt
wb = open_workbook("C:/Users/SA769740/Desktop/result2/pos.xlsx")
book = xlwt.Workbook(encoding="utf-8")
sheet1 = book.add_sheet("Sheet 1")
wordSet = [' for ', ' so ',' since ', ' Since ', ' because ', ' as ', ' As ', ' due to ', ' Due to ']
count=1
for sheet in wb.sheets():
    for row in range(sheet.nrows):
        text = ((sheet.cell(row,2).value).encode("utf-8"))
        l = ""
        for word in wordSet:
            if word in text:
                l += (word+" ")
        sheet1.write(row,0,sheet.cell(row, 0).value)
        sheet1.write(row,3, l)
        sheet1.write(row,4,count)
        sheet1.write(row,5,value)

        count += 1

book.save('C:/Users/SA769740/Desktop/result2/pos_reviews_process3.xls')

I am using python 3.4.3

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
S.De
  • 106
  • 1
  • 12
  • 1
    Please [edit] your question and fix the indention. Right now your code is not valid Python. –  Apr 05 '16 at 09:12
  • 1
    Are you **sure** you are using Python 2? The exception strongly suggest you are using Python 3. `text` is encoded from Unicode, and `word` from `wordSet` is a plain string literal. I can only reproduce your exception on Python 3. – Martijn Pieters Apr 05 '16 at 09:14
  • Probable duplicate of http://stackoverflow.com/questions/5471158/typeerror-str-does-not-support-the-buffer-interface – cdarke Apr 05 '16 at 09:16
  • At any rate, the obvious solution would be to either *actually* use Python 2, or not encode `text` to UTF-8. In Python 2, I'd use unicode strings for the `wordSet` list however. – Martijn Pieters Apr 05 '16 at 09:19
  • Thank you @MartijnPieters. It worked without the encoding part. – S.De Apr 05 '16 at 09:22
  • @S.De: yes, because you are using Python 3. – Martijn Pieters Apr 05 '16 at 09:23
  • @MartijnPieters: this is exactly the wording in the (possible) duplicate, and there are other instances from a cursory web search. Looks like the message changed, question is, can you get this message from 2.7? – cdarke Apr 05 '16 at 09:24
  • @cdarke: you can't, not for that test at least. I must've erred somewhere, 3.4 does produce this message (using *interface*, not *API*). – Martijn Pieters Apr 05 '16 at 09:25
  • 1
    @cdarke: I'm looking for a good dupe target for using `in` or `==` or any other unicode-to-bytes comparison. – Martijn Pieters Apr 05 '16 at 09:27
  • @S.De: please be careful about reporting your python version in future. If you had done a web search of your error message you would have found loads of explanations. Saying that you were on 2.7 wasted a lot of effort. – cdarke Apr 05 '16 at 09:28
  • @cdarke: and though the *error message* is the same, there can be multiple reasons that you run into it. – Martijn Pieters Apr 05 '16 at 09:31

1 Answers1

2

You are not using Python 2. You are using Python 3, and are trying to compare a str object with a bytes object.

The solution is to either switch to Python 2, or to not use str.encode() on the text value:

text = sheet.cell(row, 2).value

Even if fix your Python version and run this on Python 2, you should use Unicode values everywhere and not encode your text to UTF-8. When using text comparisons with UTF-8 encoded data you could end up with partial byte-sequence matches.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343