0

Hi i'm running a program to parse a table from a HTML address. It all works fine and I'm able to print the data i'm extracting. But when i try to write a txt file with the data I get the error msg below. Anyone can help me, please? don't know what I'm missing.

myfile.write(tds[0].text+ ","+ tds[4].text+ ","+ tds[7].text+ ","+ tds[12].text+ ","+ tds[14].text+ ","+ tds[17].text)

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "teste.py", line 14, in <module>
    myfile.write(tds[0].text+ ","+ tds[4].text+ ","+ tds[7].text+ ","+ tds[12].text+ ","+ tds[14].text+ ","+ tds[17].text.encode('utf8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)
Ruben Bermudez
  • 2,293
  • 14
  • 22
user3319895
  • 111
  • 2
  • 11

2 Answers2

0

You are mixing text types. You use "," and the *.text methods of defining the overall string you hope to write into the file. Therefore you are mixing encoding. While a more elegant solution exists to your problem, the quick and dirty way to achieve this may be to use:

str(tds[*])

vice

tds[*].text()
sahutchi
  • 2,223
  • 2
  • 19
  • 20
  • when i use str(tds[*]) the writing works but it writes the whole td tag not just the info. like this :31.21¬† – user3319895 Apr 13 '14 at 23:59
  • sorry, I missed that part of your question. It now sounds like you are talking about how do you parse html or xml. Have you used beautiful soup before:http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html – sahutchi Apr 14 '14 at 00:40
  • i'm able to parse and i have writen a file in another table parsing that i did. i'm doing the same that did before but this time is not working – user3319895 Apr 14 '14 at 01:19
0

To understand why your code breaks check out this anwser about the basics of Python and Unicode.

In your case it will probably help to use tds[*].text.decode("ecoding of original HTML")

Community
  • 1
  • 1
wedi
  • 1,332
  • 1
  • 13
  • 28