Parsing through the HTML and scripts on a webpage using python?

Question

I'm currently using Beautiful Soup to parse through the HTML of a webpage. However, I would also like to recursively parse through any .js files on the webpage as well. My goal is to look for certain URLs embedded in either the HTML or javascript of a website. I can do it with the base HTML page, but going into the javascript files is stumping me. Any help?

Related http://stackoverflow.com/questions/390992/javascript-parser-in-python. — br3w5, Oct 03 '14 at 21:43

score 0 · Accepted Answer · edited May 23 '17 at 10:26

Follow the steps outlined in the accepted answer to this StackOverflow question. You can then make a request for the resource using for example, the excellent requests library:

import requests

r = requests.get("http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js")

You can then search r.text using regex to find any links you are looking for.

If you still need to parse the javascript then the most recent answer to this StackOverflow question recommends slimit once you have the javascript.

Parsing through the HTML and scripts on a webpage using python?

1 Answers1