I'm trying to extract data from a dynamic highcharts graph (users can select date range to be displayed), and so far I haven't had any luck. (Disclaimer - I'm pretty new at this. Yikes!)
This is the website where I'd like to extract rainfall data.
And it looks like this:
Manually, I can do it easily by checking out the web inspector and copying the relevant bits to a json file, then using a Python script to convert them.
However, since I'm planning to extract several rainfall series, and for a pretty large number of stations, I'd love to automate the process so I can loop it through all the stations here. Unfortunately, I can't seem to get that right.
I can't access the json files directly - they're in a password-protected API.
And I've tried to navigate to the right branch in the tree, and to use beautifulsoup, but the best try that I've had so far has resulted in a large block of unintelligible letters as the soup, and nothing for the items that I'm interested in.
from bs4 import BeautifulSoup
import requests
import re
html=requests.get("https://portal.mrcmekong.org/time-series/chartts=24c97a09e761497098a32687a00cf86e").text
soup=BeautifulSoup(html,'html.parser')
print(soup)
items=soup.find_all('div',class_='highcharts-container')
print(items)
So far, I've been using Jupyter Notebooks for the operation.
So, I'd be really grateful if you could give me any tips, or point me towards helpful resources. These are the threads on here that I've tried so far, without success: Can I scrape the raw data from highcharts.js? Webscrape interactive chart in Python using beautiful soup with loops How to web scrape a chart by using Python? How to scrape charts from a website with python? (couldn't get selenium to work in this answer)
I'd be really, really grateful for some help!
Thanks in advance!
Have a lovely day!