0

Second question of the day. This is the code I have written until now. I am trying to extract the column of Settl.Prices and Vol.Exchange from this table: https://www.eex.com/en/market-data/power/futures/phelix-at-futures#!/2018/7/3 The results in the row is a mess, and I have tried to make it better with re.sub but I wasn't able to keep numbers, commas, and dots, not to lose position and number decimal separator. Any idea on how to have the two columns stored in two lists?

from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.by import By
import datetime
import time
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait 

today=datetime.date.today()
browser = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")
my_url = 'https://www.eex.com/en/market-data/power/futures/phelix-at-futures#!/'+str(today.year)+'/'+str(today.month)+'/'+str(today.day-1)
browser.get(my_url)
button = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "ul.tabs.filter_wrap.clearfix li.ng-scope:nth-child(3)>a"))).click()

page_html = browser.page_source
page_soup = soup(page_html, "html.parser")
browser.close()
time.sleep(5)
table = page_soup.find('table')
table_rows = table.findAll('tr')

for tr in table_rows:
    list = ""
    td = tr.find_all('td')
    row = [i.text for i in td]
    print(row)

actual output

['\n              Cal-19\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              0.51\n            ', '\n              -\n            ', '\n              -\n            ', '\n              46.15\n            ', '\n              -\n            ', '\n              -\n            ', '\n              12\n            ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n              Cal-20\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              0.54\n            ', '\n              -\n            ', '\n              -\n            ', '\n              44.62\n            ', '\n              -\n            ', '\n              -\n            ', '\n              1\n            ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n              Cal-21\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              0.65\n            ', '\n              -\n            ', '\n              -\n            ', '\n              43.70\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n              Cal-22\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              0.55\n            ', '\n              -\n            ', '\n              -\n            ', '\n              45.08\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n              Cal-23\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              0.55\n            ', '\n              -\n            ', '\n              -\n            ', '\n              45.85\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
['\n              Cal-24\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '\n              0.53\n            ', '\n              -\n            ', '\n              -\n            ', '\n              46.83\n            ', '\n              -\n            ', '\n              -\n            ', '\n              -\n            ', '', '\n\n']
['\n\nloading...\n\nan error occurred while loading the chart...\nPlease reload the chart.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInvalid Date Format: Please use the format YYYY-MM-DD.\n\n\n\n\nx\n\n\n\n\nIntraday Prices\nSettlement Prices\n\n\n\n\n\n\nall series\n\n\n\n\n\n\n\n\n\n\n\n\n\n']

wanted output

46.15,- (the one from the column adjacent)
44.62,-
43.70,-              
45.08,-  
45.85,-
46.83,-
G. Bartowski
  • 115
  • 9

1 Answers1

2

Instead of trying to work with the whole page and get the value from the html table data in the table, it would be much easier if you just called the API with the right parameters.

The API is the following :

https://www.eex.com/data//view/data/detail/ws-power-futures-austrian-v1/{year}/{month}.{day}.json

Example :

https://www.eex.com/data//view/data/detail/ws-power-futures-austrian-v1/2018/06.07.json

It fetches a JSON, then you can manipulate the data inside it as you want, you can basically build your data frame with the appropriate values using pandas. Seems an easier solution than looking directly into the page and you won't have any problems with your values.

Here are some links that may help you read a JSON :

Parsing JSON : Parsing values from a JSON file?

JSON to pandas DF : JSON to pandas DataFrame

UPDATE :

I wrote a chunk of code that should help you get the idea :

from urllib.request import Request, urlopen
import json

request=Request('https://www.eex.com/data//view/data/detail/ws-power-futures-austrian-v1/2018/06.07.json')
response = urlopen(request)
data = response.read()
d = json.loads(data)


# this first obj corresponds to : P-Power-F-AT-Peak-Quarter
first_obj = d["data"][0]["rows"]

values = []

for row in first_obj:
    if('settlementPrice' in row["data"]):
        sp = row["data"]["settlementPrice"]
        values.append(sp)

print(values)

The JSON fetched looked like this :

    {
       "data": [
            {
               "identifier": "P-Power-F-AT-Peak-Quarter",
               "rows": [
                    {
                     "data" : {'param1': value, 'param2': value, ...},
                     "contractIdentifier": value,
                    },
                    {
                     "data" : {'param1': value, 'param2': value, ...},
                     "contractIdentifier": value,
                    },
                    ...
               ]
             },
             {
               "identifier": "P-Power-F-AT-Peak-Month",
               "rows": [
                    {
                     "data" : {'param1': value, 'param2': value, ...},
                     "contractIdentifier": value,
                    },
                    {
                     "data" : {'param1': value, 'param2': value, ...},
                     "contractIdentifier": value,
                    },
                    ...
               ]
             },
             {
               "identifier": "P-Power-F-AT-Base-Year",
               "rows": [
                    {
                     "data" : {'param1': value, 'param2': value, ...},
                     "contractIdentifier": value,
                    },
                    {
                     "data" : {'param1': value, 'param2': value, ...},
                     "contractIdentifier": value,
                    },
                    ...
               ]
             },
             ...

And the results i printed out look like this :

[53.36, 63.86, 62.63, 46.83, 47.44, 59.28, 58.7]

So basically what you do is load the JSON, parse it and store the object that you want to get your data from. In the example of code i gave you, i took the first object, which is located at the index "0", which corresponds to the indentifier "P-Power-F-AT-Peak-Quarter" from the 7th of June 2018 ( parameters of this in the string of the url). You can choose which object to take by parsing you data in "d['data']" and stopping at the indentifier value that you want to take values from.

If you want to know what are the parameter names, just open the URL in a browser or download the JSON file and open it in you favorite editor.

Hope this helps a bit.

Zakaria Sahmane
  • 210
  • 2
  • 16
  • im not able to follow this path – G. Bartowski Jul 03 '18 at 15:14
  • @DavideRavera I updated the post and added the code that will help you fetch the values. I didn't use pandas because i'm in a computer which isn't mine and doesn't have pandas on it. I'll try and update it with pandas whenever i can because it would be much more efficient. Hope this helps. – Zakaria Sahmane Jul 04 '18 at 08:27
  • 1
    Thanks, at the end I managed to do it following your path yesterday. Thanks anyway for the update. – G. Bartowski Jul 04 '18 at 10:38