0

I want to capture a screenshot of a specific web element using Selenium WebDriver and compare its text content with the existing text data. So, how can I access the text content of the screenshot once it has been captured? And, I would like to use either an XPath or any other selector method as the element does not have an ID. Thank you.

  • 1
    This is complicated because it depends on the font size and font type of the font. Once you know that, it might be possible to find an API (in Java or whatever language) that handles both. You need to either find an optical character recognition (OCR) program or API that does this for you or develop one on your own. The only way to write your own is to go through each pixel and make sense out of each character once a range of pixels of input. This is common in manufacturing. Taking a screen shot in Selenium is the easy task, which would precede the challenging OCR task. – JustBeingHelpful Mar 06 '23 at 05:43
  • https://www.guru99.com/take-screenshot-selenium-webdriver.html – JustBeingHelpful Mar 06 '23 at 05:49
  • https://stackoverflow.com/questions/3422262/how-can-i-take-a-screenshot-with-selenium-webdriver – JustBeingHelpful Mar 06 '23 at 06:44

1 Answers1

1

We can take a screenshot of the WebElement

We can take a screenshot of the WebElement (an img tag in the given example) which is a captcha and to read the text on the taken screenshot, we can use the library ddddocr.

Here is the solution,

from selenium import webdriver
from selenium.webdriver.common.by import By

import ddddocr

driver = webdriver.Chrome()

driver.get('https://ma.mohw.gov.tw/masearch/')

captcha = driver.find_element(By.ID, "ctl00_ContentPlaceHolder1_ImageCheck")
captcha.screenshot(f'captcha.png')

ocr = ddddocr.DdddOcr()
# open and read the image
with open(f'captcha.png', 'rb') as f:
    img_bytes = f.read()

res = ocr.classification(img_bytes)
print(res.upper())
>> PUT7
Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24