I have an app to scrape from a website (Selenium). To scrape, it has to complete a form previously (many times, for each case I want to search). The form has a captcha. I solve the captcha using OpenCV: capturing image of browser, crop screenshot to get captcha, process the image of the captcha and read it with Pytesseract.
# Solving the captcha
driver.save_screenshot()
codeCaptcha = cv2.imread("screenshot.png")[617:662, 337:455]
# Processing image to be readable by Pytesseract
gry = cv2.cvtColor(codigoCaptcha, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
cls = cv2.morphologyEx(gry, cv2.MORPH_CLOSE, None)
thr = cv2.threshold(
cls, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
txt = (pytesseract.image_to_string(thr, config=tessdata_dir_config)
).strip()
This works perfectly on my pc. After making some changes to deploy the app to Heroku and run the scraper, it works until the OpenCV part, where there's an error. I'm almost sure that is because of the cv2.imread("screenshot.png")
, since Heroku is not saving the screenshot on server to be read after (because it can't).
2021-02-08T23:30:11.216914+00:00 app[web.1]: [2021-02-08 23:30:11,214] ERROR in app: Exception on /script [GET]
2021-02-08T23:30:11.216923+00:00 app[web.1]: Traceback (most recent call last):
2021-02-08T23:30:11.216926+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
2021-02-08T23:30:11.216926+00:00 app[web.1]: response = self.full_dispatch_request()
2021-02-08T23:30:11.216927+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
2021-02-08T23:30:11.216927+00:00 app[web.1]: rv = self.handle_user_exception(e)
2021-02-08T23:30:11.216927+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
2021-02-08T23:30:11.216928+00:00 app[web.1]: reraise(exc_type, exc_value, tb)
2021-02-08T23:30:11.216928+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
2021-02-08T23:30:11.216929+00:00 app[web.1]: raise value
2021-02-08T23:30:11.216930+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
2021-02-08T23:30:11.216930+00:00 app[web.1]: rv = self.dispatch_request()
2021-02-08T23:30:11.216930+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
2021-02-08T23:30:11.216931+00:00 app[web.1]: return self.view_functions[rule.endpoint](**req.view_args)
2021-02-08T23:30:11.216931+00:00 app[web.1]: File "/app/app.py", line 74, in script
2021-02-08T23:30:11.216932+00:00 app[web.1]: gry = cv2.cvtColor(codigoCaptcha, cv2.COLOR_BGR2GRAY)
2021-02-08T23:30:11.216933+00:00 app[web.1]: cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
Is there any way to save screenshot on memory, and not on a file, or another solution? I've read about S3, but I'm not sure yet.