2

How do I parse JavaScript code within HTML source with Python, for example I want to extract the productList object

here is my source below;

<html>
<body>
<div id="content-wrapper" class="row-fluid clearfix" role="contentinfo">
<!-- html content -->
</div>


   <script>
    var productList = { "daaa" : "ddddd"};
   </script>

</body>
</html>
parkerproject
  • 2,138
  • 1
  • 17
  • 14
  • Do either of these help? http://stackoverflow.com/questions/390992/javascript-parser-in-python http://stackoverflow.com/questions/18368058/how-can-i-parse-javascript-variables-using-python – Curtis Mattoon Nov 24 '14 at 21:41
  • one issue you may encounter at some point is that `var productList = { daaa : function() {}};` is valid JS, but not valid JSON. – njzk2 Nov 24 '14 at 21:43

2 Answers2

1

I suggest you take a look at the BeautifulSoup - it can help you extract JavaScript code from an HTML file (but not parse/run it):

source = """<html>...</html>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(source)
js_code = soup.find_all("script")[0].text

Then you can use some JavaScript interpreter to run the code and get the variables - there are some out there like this one or this one. Just Google it.

Victor
  • 158
  • 8
  • what do you think of using regexp instead to parse the extracted JavaScript? – parkerproject Nov 25 '14 at 03:16
  • @Parker, I am not sure if that's a good idea, never tried to parse any proramming language with regex myself thought. I guess you could try. Btw, you could try to use [pyparsing](http://pyparsing.wikispaces.com/): it allows you to create your own parsers to parse different languages – Victor Nov 25 '14 at 11:33
-1

I think you need to add the fuction so the computer can read if it is javascript and python, use this:

script type="text/javascript">  <!-------or python----></script>
Michael Dorner
  • 17,587
  • 13
  • 87
  • 117
Ben Riley
  • 1
  • 2
  • 2
    Hello Ben Riley, Welcome to Stack Overflow! This is not a complete answer; please go back and edit to fully answer the question. – Elias Benevedes Nov 24 '14 at 21:53