Trying to get the weather data from: http://metservice.com/maps-radar/local-observations/local-3-hourly-observations
Did find example here on how to use Ghost for web scraping dynamic content but I have not found out how to handle the result.
Since ghost seems to have issues when running in interactive shell I use
print(result)
to pipe output to file:
python getMetObservation.py > proper_result
This is my python code:
from ghost import Ghost
url = 'http://metservice.com/maps-radar/local-observations/local-3-hourly-observations'
gh = Ghost(wait_timeout=60)
page, resources = gh.open(url)
result, resources = gh.evaluate("document.getElementsByClassName('obs-content');")
print(result)
When examining the file it does contain what I am after but it also contains a huge amount of information I am not after. It is also not clear how to use the variable result that evaluate returns. Inspecting ghost.py it seems to be handled by
self.main_frame.evaluateJavaScript("%s" % script)
in:
def evaluate(self, script):
"""Evaluates script in page frame.:param script: The script to evaluate.
"""
return (
self.main_frame.evaluateJavaScript("%s" % script),
self._release_last_resources(),
)
When I execute the command:
document.getElementsByClassName('obs-content');
in a Chromium console I get the correct response.
I am a beginner when it comes to python but willing to learn. Also note that I am running this in a python virtual environment under Ubuntu if it matters.