How can i read these cells from an html code with python web-scraping?

Question

I want scraping the exchange prices informations from this website and after take it into a database: https://www.mnb.hu/arfolyamok

I need this part of html:

<tbody>
    <tr>
        <td class="valute"><b>CHF</b></td>
        <td class="valutename">svájci frank</td>
        <td class="unit">1</td>
        <td class="value">284,38</td>
    </tr>
    <tr>
        <td class="valute"><b>EUR</b></td>
        <td class="valutename">euro</td>
        <td class="unit">1</td>
        <td class="value">308,54</td>
    </tr>
    <tr>
        <td class="valute"><b>USD</b></td>
        <td class="valutename">USA dollár</td>
        <td class="unit">1</td>
        <td class="value">273,94</td>
    </tr>
</tbody>

Thats why i wrote a code, but something wrong with it. How can i fix it, where i have to change it? I need only the "valute", "valutename", "unit" and the "value" dataes. I am working with Python 2.7.13 on Windows 7.

The error message is the next: "There's an error in your program: unindent does not match any outer indentation level"

The code is here:

import csv
import requests
from BeautifulSoup import BeautifulSoup

url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})

table = str(soup)
table = table.split("<tbody>")

list_of_rows = []
for row in table[1].findAll('tr')[1:]:
    list_of_cells = []
   for cell in row.findAll('td'):
       text = cell.text.replace('&nbsp;', '')
        list_of_cells.append(text)
   list_of_rows.append(list_of_cells)

print list_of_rows

outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)

Well, you clearly have some indentation problems around `for` loops. You need to use the same amount of spaces... — 098799, Jun 08 '17 at 14:39
Indentation in Python needs to be a multiple of 4 spaces. I suggest you either fix this manually or (preferably) use a code formatter for Python such as [autopep8](https://stackoverflow.com/questions/14328406/tool-to-convert-python-code-to-be-pep8-compliant). — Hat, Jun 08 '17 at 14:44
@Hat Actually, it can be howevermany spaces you want, one is enough, but needs to be applied consistently. — 098799, Jun 08 '17 at 14:46
@098799 Well it *can* be however many spaces you want, but *should* it be? The [PEP 8 Style Guide for Python](https://www.python.org/dev/peps/pep-0008/#indentation) says to use 4 spaces per indentation level. — Hat, Jun 08 '17 at 14:49

score 0 · Answer 1 · answered Jul 13 '17 at 13:33

You have a space problem in your code from the line 18 for cell in row.findAll('td'): to line 20 list_of_cells.append(text). Here's the fixed Code :

import csv
import requests
from bs4 import BeautifulSoup

url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})

table = str(soup)
table = table.split("<tbody>")

list_of_rows = []
for row in table[1].findAll('tr')[1:]:
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text.replace('&nbsp;', '')
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

print list_of_rows

outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)

But, after executing this code, you'll face another problem, that's an character encoding error. It'll read "SyntaxError: Non-ASCII character '\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details"

How to fix that? Simple enough... add the shebang # -*- coding: utf-8 -*- at the very top of your code (1st line). It should fix it.

EDIT : Just noticed that you're using BeautifulSoup in wrong way and importing it wrong as well. I've fixed the import to from bs4 import BeautifulSoup and when using BeautifulSoup, you need to specify a parser as well. So,

soup = BeautifulSoup(html)

would become :

soup = BeautifulSoup(html, "html.parser")

How can i read these cells from an html code with python web-scraping?

1 Answers1