1

I'm trying to download captcha image which its URL and content are dynamically change every time you load a page, I understand that I can to take a screenshot for the browser and locate the captcha image location, I'm not able to locate the captcha img.

From the HTML source code I found this

//this script used to generate captcha

<iframe marginheight="0" marginwidth="0" scrolling="no" frameborder="0" width="203" height="53" name="Captcha" src="/efs/servlet/efs/jsp-ns/captcha.jsp"></iframe>

//when i click on src="/efs/servlet/efs/jsp-ns/captcha.jsp" , it leads me to this

<html>
<head><meta scheme='a1afcc517bec909bf5c3fddea7c83c3d' name='TSd58639' content='b133d7457db43c81' /> <meta scheme='eb1e31097f37b3d64bef23cbd5cab231' name='1000' content='5' /><!-- 9cc5da25f89a21d1fbb5ffa18da0bb73 --><script type="text/javascript">//<![CDATA[
eval(function(a){var f=a.split("");var c=f.length;var b=parseInt(f[0]+f[1],16);var e=String.fromCharCode(b);for(var d=2;d<c;d++){var g=(parseInt(f[d]+f[d+1],16)-b)%256;b=g;e+=String.fromCharCode(g);d++}return e}("288..."));
</script>
<script language="JavaScript">var pn = "CSRT"; var pv = '3642466061891909727';
eval(function(a){var f=a.split("");var d=f.length;var c=parseInt(f[0]+f[1],16);var e=String.fromCharCode(c);for(var b=2;b<d;b++){var g=(parseInt(f[b]+f[b+1],16)-c)%256;c=g;e+=String.fromCharCode(g);b++}return e}("288edbe3..."));
</script>

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Insert title here</title>
</head>
<body>
<img src="Captcha.jpg?t=1378993130057" border=1/>
</body>
</html>

this line '<img src="Captcha.jpg?t=1378993130057" border=1/>' define the captcha url but the number 't=1378993130057' dynamically change

I've seen this thread Download image with selenium python but I don't understand how the authors could find out the image location such as

img = browser.find_element_by_xpath('//*[@id="cryptogram"]')

for google captcha [http://www.google.com/recaptcha/demo/recaptcha]

img = driver.find_element_by_xpath('//div[@id="recaptcha_image"]/img')

python 2.6 I'm using Selenuim to browse the site

update

try:
    browser.save_screenshot('screenshot.png')
    img = browser.find_element_by_xpath('//body/img')
    src = img.get_attribute('src')
    loc = img.location

except Exception,e:
    print e

output

Message: u'Unable to locate element: {"method":"xpath","selector":"//body/img"}' ; Stacktrace: 
    at FirefoxDriver.prototype.findElementInternal_ (file:///tmp/tmppjlmPW/extensions/fxdriver@googlecode.com/components/driver_component.js:8899)
    at FirefoxDriver.prototype.findElement (file:///tmp/tmppjlmPW/extensions/fxdriver@googlecode.com/components/driver_component.js:8908)
    at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmppjlmPW/extensions/fxdriver@googlecode.com/components/command_processor.js:10840)
    at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmppjlmPW/extensions/fxdriver@googlecode.com/components/command_processor.js:10845)
    at DelayedCommand.prototype.execute/< (file:///tmp/tmppjlmPW/extensions/fxdriver@googlecode.com/components/command_processor.js:10787)

Update #2

from selenium import webdriver
import datetime
from selenium.webdriver.common.proxy import *


print '[+] Starts at '+ datetime.datetime.now().isoformat()

browser = webdriver.Firefox() 
browser.get("https://www.example.com") 


try:
    browser.save_screenshot('screenshot.png')
    img = browser.find_element_by_xpath('//body/img')
    src = img.get_attribute('src')
    loc = img.location

except Exception,e:
    print e


browser.delete_all_cookies()
browser.close()

print '[+] Done at ' + datetime.datetime.now().isoformat()

Any help is much appreciated.

Community
  • 1
  • 1
fooBar
  • 412
  • 6
  • 17

2 Answers2

1

You can get the img tag by xpath, get src attribute value and then download it via urlretrieve:

import urllib

img = browser.find_element_by_xpath('//body/img')
src = img.get_attribute("src")
urllib.urlretrieve(src, "captcha.png")
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • thanks for your quick response, i'm trying to understand your way, so i won't post new thread each time i need help :) , did you define xpath as ('//body/img') because of from my html source code ? i will try your code and update you – fooBar Sep 12 '13 at 14:15
  • @Hussam yeah, I've just looked at html you've provided. Should work if the html is real. Let me know if you have problems with it. Thanks. – alecxe Sep 12 '13 at 16:33
  • okay i understand now, please see the update above, i got an exception not sure what it means, right now i'm try to find away for saving the captcha image after finding its location, one interesting thing i noticed that the output for browser.save_screenshot('screenshot.png') is missing the captcha image so i think we could locate it successfully. if you cn give me a hint for saving the image that would be great. thanks for your help – fooBar Sep 12 '13 at 17:42
  • @Hussam could you show the whole code you are using? Paste it on gist.github.com for example. – alecxe Sep 12 '13 at 17:46
  • Nope, i used PIL library to crop the image, sounds unprofessional but it worked. thanks for your help sir – fooBar Sep 13 '13 at 15:19
0
import urllib

img = browser.find_element_by_cssselector("img[src*='Captcha.jpg']")

src = img.get_attribute("src")

urllib.urlretrieve(src, "Captcha.jpg")

Try this Method and Let me know is it working or not.

Sathish D
  • 4,854
  • 31
  • 44
user3487861
  • 340
  • 2
  • 2