Alternatively to the selenium
or webkit
based approach, you can parse the javascript with a javascript code parser, like slimit
. It definitely raises the complexity and reliability of the web-scraping since you go down to a bare hardcore metal with it - think about it as a "white box" approach as opposed to selenium
based high-level "black box" one.
Here's the answer I've given for an exact same topic/problem you are asking about:
It involves the use of slimit
to grab an object from the javascript code, loading it to a python data structure via json
module and parsing the HTML inside with BeautifulSoup
parser.