1

Is there a possibility if I code a program in python that allows to automatically browse a given website using mechanize to detect if there are popup windows (suggesting advertisements or downloading actions ...) using Python ?. I would appreciate any hint (for example, if you give me a library that fulfills this task I would be very happy)

Community
  • 1
  • 1

2 Answers2

5

Mechanize cannot handle javascript and popup windows:

To accomplish the goal, you need to utilize a real browser, headless or not. This is where selenium would help. It has a built-in support for popup dialogs:

Selenium WebDriver has built-in support for handling popup dialog boxes. After you’ve triggerd and action that would open a popup, you can access the alert with the following:

alert = driver.switch_to_alert()

Example (using this jsfiddle):

from selenium import webdriver

url = "http://fiddle.jshell.net/ebkXh/show/"
driver = webdriver.Firefox()
driver.get(url)

button = driver.find_element_by_xpath('//button[@type="submit"]')

# dismiss
button.click()
driver.switch_to.alert.dismiss()

# accept
button.click()
driver.switch_to.alert.accept()

See also:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Is selenium API able to close the pop up when switching to it ? My aim is to detect popups on a given webpage and close them automatically. –  Jun 16 '14 at 12:46
  • @begueradj yes, you can `switch_to_alert()` and `accept()` or `dismiss()` it, see this thread for more info: http://stackoverflow.com/questions/8631500/click-the-javascript-popup-through-webdriver. – alecxe Jun 16 '14 at 12:49
  • That is exactly what I am looking for. Simple and efficient. Thank you. I want to award you the 100 points but it says I must wait 22 hours to be able to do it. –  Jun 16 '14 at 12:50
  • @begueradj glad it helped. I'm going to provide you with an example also. – alecxe Jun 16 '14 at 12:59
  • I did not downvote !!! I tested your code and it works perfectly !!! It is not me, why should I do while your answer is exactly what I have been looking for almost 2 weeks ?? –  Jun 16 '14 at 15:12
  • @begueradj nono, it is not about you. I just cannot understand the reason for downvoting. Since there is no explanation, there is probably a personal reason. Since only the question and my answer received downvotes, I suspect it is coming from another answerer. Though, it is what it is, just glad you solved the problem. Thanks. – alecxe Jun 16 '14 at 15:15
  • One last question if I may, do you think it is possible to browse all the pages of a given website using selenium ? –  Jun 16 '14 at 17:45
  • @begueradj sure, well, you can recursively extract the links and follow them. Also, you can combine selenium with [`Scrapy`](http://scrapy.org/).. But, it is really a part of a completely different topic..you can even ask it here on SO. – alecxe Jun 16 '14 at 18:39
  • I did a big mistake :: I awarded the 100 points to the answerer below :( I am so sorry, my connexion is so bad, i got frustrated and i clicked on the wrong place :( –  Jun 17 '14 at 13:18
  • @begueradj don't worry, that's ok :) – alecxe Jun 17 '14 at 13:23
  • It is not ok for me, I have faced this problem since 2 weeks, so I feel shame i did this mistake now that you are the only one to help me :(. I asked a question about my mistake on here: http://meta.stackoverflow.com/questions/260795/how-to-take-back-my-bounty-and-give-it-to-the-right-answerer –  Jun 17 '14 at 13:26
  • I will follow the answer of this man once I am I able to do that (i must have at least 150 points). I keep you in mind because your answer is more than useful for me (i need it in a big project). It is just a question of being fair with you. Regards. http://meta.stackoverflow.com/questions/260795/how-to-take-back-my-bounty-and-give-it-to-the-right-answerer –  Jun 17 '14 at 13:38
  • 3
    @begueradj you have already did a lot of things other people would not do, you showed your concern - this is really enough for me and I appreciate it. You are a good questioner and you deserve the points you have. Besides, it is not about points for me, it is about helping people and solving interesting problems; points and badges are just bonuses. Thank you, and don't worry about the bounty. – alecxe Jun 17 '14 at 13:45
  • I did not want to bother you with my question again, but I asked early this morning this [question](http://stackoverflow.com/questions/24257802/how-to-browse-a-whole-website-using-selenium) but I did not get any answer. Thank you in advance. –  Jun 17 '14 at 14:11
1

Unfortunately, Mechanize's browser seems to skip the pop-ups so the title, URL, and HTML are identical for both pop-ups and normal pages.

Frankly, Python is not the right tool for this job and is lagging behind in this respect IMHO. Having spent months doing web crawling, for sites that use Javascript extensively (the number of which is greatly increasing nowadays), I find that using Javascript-Based environments like PhantomJS or SlimerJS are simply better for what you're trying to do.

If you have the luxury to use Javascript-Based environments, I'd say go right ahead. However, you can still use python. PhantomJS embeds Ghost Driver. You can use Ghost.py to utilize the power of PhantomJS. Or you can use Selenium with Python as illustrated here.

Community
  • 1
  • 1
Tamer Tas
  • 3,288
  • 13
  • 22
  • Thank you for all the links. I have been browsing (quickly) around PhantomJS to see if deals with popups, but I don't find for the moment. Whatever the third party technology I use, it must interact with Python because my main program is done in Python. –  Jun 16 '14 at 12:13
  • You can check `Ghost.py` at the end of my answer. – Tamer Tas Jun 16 '14 at 12:16
  • Yes, I saw all the links you gave me, now I am just looking for the possibilities of PhantomJS to discover popup windows, then I can call from a Python program –  Jun 16 '14 at 12:20
  • Perhaps [this](https://github.com/ariya/phantomjs/issues/11325) might give you a taste. – Tamer Tas Jun 16 '14 at 12:24