0

I have already imported the data from the web page using request and beautifulsoup. Still, I wanted to export the results to csv file. Is it possible?

from bs4 import BeautifulSoup
import urllib.request
import csv

url = 'https://www.fundamentus.com.br/resultado.php'
r = urllib.request.urlopen(url).read()

soup = BeautifulSoup(r, 'lxml')

data = []
table = soup.find('table', { "id" : "resultado" })
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

print(data)
  • 1
    Sure, it's possible. It looks like you're almost there. Can you show us how you've tried to use the `csv` module? – larsks Jan 03 '20 at 12:17
  • I've just starting... Tried this: f = csv.writer(open('fundamentus.csv', 'w')) f.writerow(['Papel', 'Cotação']) – Caroline Scholles Jan 03 '20 at 12:21
  • @tripleee it worked, but still the enconding is incorrect in the csv file =/ – Caroline Scholles Jan 03 '20 at 12:26
  • There are many reasons that could happen. If the HTML contains UTF-8 and you are on Python 3, it might not be a trivial problem, but for a start, examine those. Another common beginner problem is looking at completely good UTF-8 with some random Windows tool which expects a legacy 8-bit encoding and believing that the problem is in the encoding, not in the tool. If you still can't figure it out, probably accept the duplicate here and ask a new question about that. See also the [Stack Overflow `character-encoding` tag info page](/tags/character-encoding/info) which contains some hints. – tripleee Jan 03 '20 at 12:35
  • Well, python 3 and linux... Still, this is happening: https://docs.google.com/spreadsheets/d/12NYm4Vzp9cu4bjEQQAuM_VelMrlN0KAMV1hKiUlnS90/edit?usp=sharing – Caroline Scholles Jan 03 '20 at 12:39
  • I will take a look at the documentation you sent – Caroline Scholles Jan 03 '20 at 12:40
  • The spreadsheet (yech) contains `bytes` objects, you have to decode them. The Portuguese word in cell B1 looks like it contains valid UTF-8 once you decode. See e.g. https://stackoverflow.com/questions/51200908/python3-decode-utf-8-bytes-converted-as-string – tripleee Jan 03 '20 at 12:41

0 Answers0