0

I'm trying to extract, with python, some javascript variables from an HTML site:

<script>
var nData = new Array();
var Data = "5b7b......";
nData = CallInit(Data);
...
...
</script>

I can see the content of "nData" in firebug (DOM Panel) without problem:

[Object { height="532",  width="1280",  url="https://example.org...8EDA4F3F5F395B9&key=lh1",  more...}, Object { height="266",  width="640",  url="https://example.org...8EDA4F3F5F395B9&key=lh1",  more...}]

The content of nData is an URL. How can i parse/extract the content of nData to python? It's possible?

Thanks

Reat0ide
  • 11
  • 1
  • 5
  • Can you give us a link to the site? – halex Apr 17 '15 at 09:23
  • Do you have influence on the source code in a JS context before moving it to python? For example open webpage, insert a JS-write statement and save it as HTML. So you can write the variable as html first and then parse it via python. – wenzul Apr 17 '15 at 09:24
  • If not you need kind of javascript runtime environment. May checkout the answers of http://stackoverflow.com/questions/2346584/conversion-from-javascript-to-python-code and http://stackoverflow.com/questions/2894946/passing-javascript-variable-to-python. – wenzul Apr 17 '15 at 09:30
  • @wenzul no, i'm only trying to extract the url from the site, and use it in a python script. – Reat0ide Apr 17 '15 at 09:33

1 Answers1

3

With the help of the python library Ghost.py it should be possible to get a dynamic variable out of executed Javascript code.

I just tried it out with some small test site and got a Javascript variable named a which I use on that page as a python object. I did the following:

  1. Install Ghost.py with pip install Ghost.py.

  2. Install PySide (it's a prerequisite for Ghost.py) with pip install PySide.

  3. Use the following python code:

    from ghost import Ghost
    ghost = Ghost()
    ghost.open('https://dl.dropboxusercontent.com/u/13991899/test/index.html')
    js_variable, _ = ghost.evaluate('a', expect_loading=True)
    print js_variable
    

You should be able to get your variable nData into the python variable js_variable by opening your site with ghost.open and then call ghost.evaluate('nData').

halex
  • 16,253
  • 5
  • 58
  • 67
  • Cool, didn't know ghost. Just mechanize and stuff. For just retrieving the urls you could just look into `CallInit` and build your urls with python with `Data` as a parameter. – wenzul Apr 17 '15 at 10:48
  • It's possible to do the same but using machanize? – Reat0ide Apr 17 '15 at 15:11
  • Please update the ghost library based on its official website's information. I found the ghost class now only have ghost.start() in its newest version, and it is using sessions to manage the crawling. – windsound Oct 24 '16 at 02:13