0

Im trying to extract all the images from a page. I have used Mechanize Urllib and selenium to extract the Html but the part i want to extract is never there. Also when i view the page source im not able to view the part i want to extract. Instead of the Description i want to extract there is this:

 <div class="loading32"></div>
 </div>

 </div>
 </div>

But if i try to view it using the inspect element option its there. Is there a easy way to figure out what this script does without any java knowledge? So i can bypass it. or is there a way to get an equivalent of inspect element using selenium in python 2.7? What is the difference between View page source and inspect element anyway?

burning_wipf
  • 71
  • 10

2 Answers2

1

Possibly you're trying to get elements that are created with a client sided script. I don't think javascript elements run when you just send a GET/POST request (which is what I'm assuming you mean by "view source").

John
  • 46
  • 1
  • No. i actually ment view source. Like in firefox , Manually right click and then view source or inspect element. And im fearly sure its a client sided script, the page is swarming with it. But i thought i would solve that problem using Selenium, that did not work though. – burning_wipf Sep 23 '16 at 17:24
  • Well then all that does is send a GET request and shows the returned source, to my knowledge. To get what you want, you're going to have to either run or parse the javascript element that creates the items you need. – John Sep 23 '16 at 17:26
  • Well. I thought i did that already using selenium, as it is a functional browser that parses all the scripts on the entire page but this did not work as i said already. So the only solution i have would be to find the java script that has the GET request translate it to python and include it into my my code. is that right? The jscript on the page is one big clusterfuck, is there a easier way to run the script? – burning_wipf Sep 23 '16 at 17:45
0

At the time I was not aware how much content is loaded in through js after the page is loaded. Mechanize does not have a JavaScript interpreter. The way I ended up solving this is extracting the links from the *.js file and redoing the get commend with urllib and getting the required content that way.

burning_wipf
  • 71
  • 10