I am using Python 2.7.3 and BeuatofulSoup to grab data from a website's table, then using codecs
to write content to a file. One of the variables I collect, occasionally has garbled characters in it. For example, if the website table looks like this
Year Name City State
2000 John D’Iberville MS
2001 Steve Arlington VA
So when I generate my City
variable, I always encode it as utf-8
:
Year = foo.text
Name = foo1.text
City = foo3.text.encode('utf-8').strip()
State = foo4.text
RowsData = ("{0},{1},{2},{3}").format(Year, Name, City, State)
So that the contents of a list of comma separated strings I create called RowData
and RowHeaders
look like this
RowHeaders = ['Year,Name,City,State']
RowsData = ['2000, John, D\xc3\xa2\xe2\x82\xac\xe2\x84\xa2Iberville, MS',
'2001, Steve, Arlington, VA']
Then I attempt to write this to a file using the following code
file1 = codecs.open(Outfile.csv,"wb","utf8")
file1.write(RowHeaders + u'\n')
line = "\n".join(RowsData)
file1.write(line + u'\r\n')
file1.close()
and I get the following error
Traceback (most recent call last):
File "HSRecruitsFBByPosition.py", line 141, in <module>
file1.write(line + u'\r\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6879: ordinal not in range(128)
I can use the csv writer package on RowsData
and it works fine. For reasons that I don't want to get into, I need to use codecs to output the csv file. I can't figure out what is going on. Can anyone help me fix this issue? Thanks in advance.