1

I'd really use some help with scraping the data from the line or donut charts on this website. I need this data for a study project focusing on forecasting solar and wind production in the Netherlands.

I'd like to use Python for the task and I'd attempted doing so using Selenium.

Data is stored in canvas elements, which makes this a bit more challenging than expected and I'd use some help with figuring out the right approach to extract the data. Any help doing this would be much appreciated.

My approach till now has been to locate the line-chart element and then 'move the mouse' (using Selenium Actions and move_to_element_with_offset function) over the charts from left to right.

For each step, I'd record the data that will be available in the hover text and somehow link that to the right timestamp.

See here for a screen-shot of how it looks in my browser. Note how the Zonne energie data value appears in the div below when hovering :

How it looks in the browser

The problem is, however, that I'm not able to receive the data in the page source. Probably because I'm not not able to figure out how to hover the mouse over the chart using Selenium.

My initial code is:

chrome_driver_path = pathlib.Path(__file__).parent / "chromedriver"
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(executable_path=chrome_driver_path,options=options)
url = "https://energieopwek.nl"
driver.get(url)

line_chart=driver.find_element(By.ID,"linechart_1")
action.move_to_element(line_chart).click().perform() # clicking on the chart
soup = BeautifulSoup(driver.page_source, 'lxml')
print(soup.prettify()) # I'd expect to see the data in the page source, but it's not

Here is the page source output. I'd have expected data from the chart to be present in the divs, as in the screen-shot above:

<div _echarts_instance_="ec_1652165210746" class="eo-chart" id="linechart_1" style="-webkit-tap-highlight-color: transparent; user-select: none; position: relative; background: rgba(0, 0, 0, 0);">
  <div style="position: relative; overflow: hidden; width: 744px; height: 385px; padding: 0px; margin: 0px; border-width: 0px; cursor: default;">
    <canvas data-zr-dom-id="zr_0" height="385" style="position: absolute; left: 0px; top: 0px; width: 744px; height: 385px; user-select: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0); padding: 0px; margin: 0px; border-width: 0px;" width="744">
    </canvas>
  </div>
  <div>
  --- WHERE IS THE DATA?---
  </div>
</div>

Curious to hear if anybody is able to help me here ?

Jean Vache
  • 83
  • 1
  • 10
  • Instead of scraping the site can you not use the api to collect the data?? if we change the day we can see that it calls: `https://energieopwek.nl/data.php?sid=2ecde3&Day=2022-05-05&scale=day` maybe that is what you need – Helder Sepulveda May 09 '22 at 19:01
  • That's is a good suggestion and I've tried that, but the data available there is obfuscated and I don't know how to reverse engineer it. – Jean Vache May 10 '22 at 06:41

2 Answers2

1

If this is for a project you are going to publish you should reach to the source asking for permission, or get lawyers involved to make sure you are not breaking the Terms of service on that site. I get a feeling they might have obfuscated the data to prevent what you are trying to do.


About my comment and the data available on:
https://energieopwek.nl/data.php?sid=2ecde3&Day=2022-05-05&scale=day

Even with the JS code uglyfied we can still make up some:
... that return seriesData caught my eye, looks like that is the raw data for the chart

If you know how to use debug on the developer console that is your start point

And it looks like there is a way to read JS variables from selenium if that is what you prefer using:
Reading JavaScript variables using Selenium WebDriver


Helder Sepulveda
  • 15,500
  • 4
  • 29
  • 56
  • Thank you for the answer and for the advice regarding potential legal issues with scraping the data. I did manage to inspect the seriesData object and it does contain the data! Next step then will be to read it with python. Thank you again for helping me get started on this. My use of this data will only be for study purposes, so I guess that's ok, but will nevertheless contact the company asking for permission. – Jean Vache May 11 '22 at 13:15
0

You can take a screenshot with selenium then crop it automatically. Here's an example of something like that I've done before.

element = driver.find_element_by_xpath('//*[@id="THIS_WEEK"]')
location = element.location
size = element.size
driver.save_screenshot("due.png")
x = location['x']
y = location['y']
w = size['width']
h = size['height']
width = x + w
height = y + h
im = Image.open('due.png')
im = im.crop((int(x), int(y), int(width), int(height)))
im.save('due.png')
Riceblades
  • 319
  • 1
  • 17
  • Thanks, taking a snap and extracting the values from the images is an interesting approach, although a bit more involved. Will keep it as a potential plan b. – Jean Vache May 10 '22 at 10:48