How to process images on Heroku

Question

I have an app to scrape from a website (Selenium). To scrape, it has to complete a form previously (many times, for each case I want to search). The form has a captcha. I solve the captcha using OpenCV: capturing image of browser, crop screenshot to get captcha, process the image of the captcha and read it with Pytesseract.

# Solving the captcha

driver.save_screenshot()
codeCaptcha = cv2.imread("screenshot.png")[617:662, 337:455]

# Processing image to be readable by Pytesseract

gry = cv2.cvtColor(codigoCaptcha, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
cls = cv2.morphologyEx(gry, cv2.MORPH_CLOSE, None)
thr = cv2.threshold(
  cls, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

txt = (pytesseract.image_to_string(thr, config=tessdata_dir_config)
                   ).strip()

This works perfectly on my pc. After making some changes to deploy the app to Heroku and run the scraper, it works until the OpenCV part, where there's an error. I'm almost sure that is because of the cv2.imread("screenshot.png"), since Heroku is not saving the screenshot on server to be read after (because it can't).

2021-02-08T23:30:11.216914+00:00 app[web.1]: [2021-02-08 23:30:11,214] ERROR in app: Exception on /script [GET]
2021-02-08T23:30:11.216923+00:00 app[web.1]: Traceback (most recent call last):
2021-02-08T23:30:11.216926+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
2021-02-08T23:30:11.216926+00:00 app[web.1]:     response = self.full_dispatch_request()
2021-02-08T23:30:11.216927+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
2021-02-08T23:30:11.216927+00:00 app[web.1]:     rv = self.handle_user_exception(e)
2021-02-08T23:30:11.216927+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
2021-02-08T23:30:11.216928+00:00 app[web.1]:     reraise(exc_type, exc_value, tb)
2021-02-08T23:30:11.216928+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
2021-02-08T23:30:11.216929+00:00 app[web.1]:     raise value
2021-02-08T23:30:11.216930+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
2021-02-08T23:30:11.216930+00:00 app[web.1]:     rv = self.dispatch_request()
2021-02-08T23:30:11.216930+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
2021-02-08T23:30:11.216931+00:00 app[web.1]:     return self.view_functions[rule.endpoint](**req.view_args)
2021-02-08T23:30:11.216931+00:00 app[web.1]:   File "/app/app.py", line 74, in script
2021-02-08T23:30:11.216932+00:00 app[web.1]:     gry = cv2.cvtColor(codigoCaptcha, cv2.COLOR_BGR2GRAY)
2021-02-08T23:30:11.216933+00:00 app[web.1]: cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-_a0ur5ao/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Is there any way to save screenshot on memory, and not on a file, or another solution? I've read about S3, but I'm not sure yet.

`cv2.imread()` seems to only accept a filepath string. But with PIL you can load from a [stream/bytes in memory](https://stackoverflow.com/questions/32908639/open-pil-image-from-byte-file) or from an [URL](https://stackoverflow.com/questions/7391945/how-do-i-read-image-data-from-a-url-in-python). Not sure if any of those help you, I don't know anything about Heroku. — Reti43, Feb 08 '21 at 23:59
https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem I think your issue is somewhere else. Heroku dynos can write to the filesystem but the filesystem is wiped after a restart. I think the image does not exist (or at least not in the folder you expect it to exist). — Tin Nguyen, Feb 09 '21 at 09:16

How to process images on Heroku

0 Answers0