-8

i would like to extract the following table content and save it in a CSV file via pandas, but only extract the date (e.g. Thu, 11/02) and all values, which are tagged by €/MWh. Thank you all very much in advance.

Source code:

<table cellspacing="0" cellpadding="0" border="0" class="list hours responsive" width="100%">
<tbody>
    <tr>
        <th class="title"></th>
        <th class="units"></th>
        <th>Thu, 11/02</th>
        <th>Fri, 12/02</th>
        <th>Sat, 13/02</th>
        <th>Sun, 14/02</th>
        <th>Mon, 15/02</th>
        <th>Tue, 16/02</th>
        <th>Wed, 17/02</th>
    </tr>
    <tr class="no-border">
        <td class="title">
            00 - 01
        </td>
        <td>€/MWh</td>
        <td>23.82</td>
        <td>22.81</td>
        <td>22.23</td>
        <td>13.06</td>
        <td>16.57</td>
        <td>25.99</td>
        <td>32.45</td>
    </tr>
    <tr>
        <td>&nbsp;</td>
        <td>MWh</td>
        <td>10,266.0</td>
        <td>9,626.6</td>
        <td>12,255.9</td>
        <td>11,084.7</td>
        <td>11,039.5</td>
        <td>13,134.7</td>
        <td>9,958.1</td>
    </tr>
    <tr class="no-border">
        <td class="title">
            01 - 02
        </td>
        <td>€/MWh</td>
        <td>21.48</td>
        <td>21.59</td>
        <td>21.10</td>
        <td>12.17</td>
        <td>16.00</td>
        <td>23.65</td>
        <td>31.27</td>
    </tr>
    <tr>
        <td>&nbsp;</td>
        <td>MWh</td>
        <td>9,843.3</td>
        <td>9,494.4</td>
        <td>11,823.3</td>
        <td>10,531.9</td>
        <td>9,970.5</td>
        <td>12,875.6</td>
        <td>9,958.8</td>
    </tr>
    <tr class="no-border">
        <td class="title">
            02 - 03
        </td>
        <td>€/MWh</td>
        <td>21.00</td>
        <td>21.30</td>
        <td>20.21</td>
        <td>8.81</td>
        <td>14.55</td>
        <td>22.91</td>
        <td>29.72</td>
    </tr>
    <tr>
        <td>&nbsp;</td>
        <td>MWh</td>
        <td>9,857.0</td>
        <td>9,427.9</td>
        <td>11,755.2</td>
        <td>10,061.9</td>
        <td>9,881.7</td>
        <td>12,841.0</td>
        <td>9,896.9</td>
    </tr>
    <tr class="no-border">
        <td class="title">
            03 - 04
        </td>
        <td>€/MWh</td>
        <td>19.94</td>
        <td>19.86</td>
        <td>19.94</td>
        <td>6.74</td>
        <td>13.14</td>
        <td>22.04</td>
        <td>27.44</td>
    </tr>
    <tr>
        <td>&nbsp;</td>
        <td>MWh</td>
        <td>9,486.2</td>
        <td>10,492.7</td>
        <td>12,609.1</td>
        <td>11,216.6</td>
        <td>10,199.9</td>
        <td>11,209.7</td>
        <td>9,698.5</td>
    </tr>
</tbody>

Zulu
  • 8,765
  • 9
  • 49
  • 56
  • 2
    Please [edit] your question and 1) improve the indention of your HTML, 2) add the Python code you have tried. –  Feb 17 '16 at 11:48
  • http://stackoverflow.com/questions/11790535/extracting-data-from-html-table – CodeMonkey Feb 17 '16 at 11:49
  • Does this answer your question? [Converting a HTML table to a CSV in Python](https://stackoverflow.com/questions/54668618/converting-a-html-table-to-a-csv-in-python) – EdKenbers Jun 21 '20 at 09:50

4 Answers4

0

Following code will give you row wise result of your page:

from bs4 import BeautifulSoup
import urllib.request

response = urllib.request.urlopen('file:///F:/test.html')
html = response.read()    
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'list hours responsive'})
rows = table.findAll('tr')
for tr in rows:
  text = []
  cols = tr.findAll('td')
  for td in cols:
    try:
      text = ''.join(td.find(text=True))
    except Exception:
        text = "000"
    print(text+",")

My test HTML page was stored as test.html in F: drive

<html>
<body>
<table cellspacing="0" cellpadding="0" border="0" class="list hours responsive" width="100%">
                <tbody>
                <tr>
                    <th class="title"></th>
                    <th class="units"></th>
                                                <th>Thu, 11/02</th>
                                                <th>Fri, 12/02</th>
                                                <th>Sat, 13/02</th>
                                                <th>Sun, 14/02</th>
                                                <th>Mon, 15/02</th>
                                                <th>Tue, 16/02</th>
                                                <th>Wed, 17/02</th>

                </tr>
                                        <tr class="no-border">
                        <td class="title">
                                                                00 - 01
                                                        </td>
                        <td>€/MWh</td>
                                                        <td>23.82</td>
                                                        <td>22.81</td>
                                                        <td>22.23</td>
                                                        <td>13.06</td>
                                                        <td>16.57</td>
                                                        <td>25.99</td>
                                                        <td>32.45</td>
                                                </tr>
                    <tr>
                        <td>&nbsp;</td>
                        <td>MWh</td>
                                                        <td>10,266.0</td>
                                                        <td>9,626.6</td>
                                                        <td>12,255.9</td>
                                                        <td>11,084.7</td>
                                                        <td>11,039.5</td>
                                                        <td>13,134.7</td>
                                                        <td>9,958.1</td>
                                                </tr>
                                        <tr class="no-border">
                        <td class="title">
                                                                01 - 02
                                                        </td>
                        <td>€/MWh</td>
                                                        <td>21.48</td>
                                                        <td>21.59</td>
                                                        <td>21.10</td>
                                                        <td>12.17</td>
                                                        <td>16.00</td>
                                                        <td>23.65</td>
                                                        <td>31.27</td>
                                                </tr>
                    <tr>
                        <td>&nbsp;</td>
                        <td>MWh</td>
                                                        <td>9,843.3</td>
                                                        <td>9,494.4</td>
                                                        <td>11,823.3</td>
                                                        <td>10,531.9</td>
                                                        <td>9,970.5</td>
                                                        <td>12,875.6</td>
                                                        <td>9,958.8</td>
                                                </tr>
                                        <tr class="no-border">
                        <td class="title">
                                                                02 - 03
                                                        </td>
                        <td>€/MWh</td>
                                                        <td>21.00</td>
                                                        <td>21.30</td>
                                                        <td>20.21</td>
                                                        <td>8.81</td>
                                                        <td>14.55</td>
                                                        <td>22.91</td>
                                                        <td>29.72</td>
                                                </tr>
                    <tr>
                        <td>&nbsp;</td>
                        <td>MWh</td>
                                                        <td>9,857.0</td>
                                                        <td>9,427.9</td>
                                                        <td>11,755.2</td>
                                                        <td>10,061.9</td>
                                                        <td>9,881.7</td>
                                                        <td>12,841.0</td>
                                                        <td>9,896.9</td>
                                                </tr>
                                        <tr class="no-border">
                        <td class="title">
                                                                03 - 04
                                                        </td>
                        <td>€/MWh</td>
                                                        <td>19.94</td>
                                                        <td>19.86</td>
                                                        <td>19.94</td>
                                                        <td>6.74</td>
                                                        <td>13.14</td>
                                                        <td>22.04</td>
                                                        <td>27.44</td>
                                                </tr>
                    <tr>
                        <td>&nbsp;</td>
                        <td>MWh</td>
                                                        <td>9,486.2</td>
                                                        <td>10,492.7</td>
                                                        <td>12,609.1</td>
                                                        <td>11,216.6</td>
                                                        <td>10,199.9</td>
                                                        <td>11,209.7</td>
                                                        <td>9,698.5</td>
                                                </tr>

                                    </tbody>
            </table>
            </body>
</html>

Output of the code is as follows:

00 - 01,
€/MWh,
23.82,
22.81,
22.23,
13.06,
16.57,
25.99,
32.45,
 ,
MWh,
10,266.0,
9,626.6,
12,255.9,
11,084.7,
11,039.5,
13,134.7,
9,958.1,

01 - 02,
€/MWh,
21.48,
21.59,
21.10,
12.17,
16.00,
23.65,
31.27,
 ,
MWh,
9,843.3,
9,494.4,
11,823.3,
10,531.9,
9,970.5,
12,875.6,
9,958.8,

02 - 03,
€/MWh,
21.00,
21.30,
20.21,
8.81,
14.55,
22.91,
29.72,
 ,
MWh,
9,857.0,
9,427.9,
11,755.2,
10,061.9,
9,881.7,
12,841.0,
9,896.9,

03 - 04,
€/MWh,
19.94,
19.86,
19.94,
6.74,
13.14,
22.04,
27.44,
 ,
MWh,
9,486.2,
10,492.7,
12,609.1,
11,216.6,
10,199.9,
11,209.7,
9,698.5,
Pranav Waila
  • 418
  • 4
  • 19
0

There is a encoding problem, you should encode your response before printing it.

xlm
  • 6,854
  • 14
  • 53
  • 55
secnoodle
  • 1
  • 1
0

You can refer to this example code:

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests
from bs4 import BeautifulSoup

url='http://news.sina.com.cn/'
res=requests.get(url)
res.encoding='utf-8'      #This is the key code
soup=BeautifulSoup(res.text,'html.parser')
tags=soup.select('a')

for tag in tags:
    try:
        link=tag['href']
        link=str(link)
        if link.startswith('http'):
            print(link)
        else:
            print(False)
    except:
        print('null')
secnoodle
  • 1
  • 1
0

There is an easy/sneaky way to get around this.
I went to a online HTML reader and printed the result.
Then copied it and pasted it to an Excel file.
Now you have two options:

  1. Edit the values on Excel and export the result as CSV;
  2. Save the Excel, load it with Pandas, manipulate-it, and then export it as CSV.

For the second option you would use the column with the units to look for "€" symbol.