1

I'm trying to make python scrapy that use flask running in vps server. Python scrapy runs well when use following command.

python3 scraper_app.py

That works well and I got the desired result. But if i close the ssh(putty), the command is closed with that. So I'm trying to run command as service. I've done all needed to run as service. And

systemctl start scrapy.service

This command works well, but while running following error occurs.

subprocess.CalledProcessError: Command '['scrapy', 'crawl', 'bcpa', '-a', 'file=uploads/broward_err.csv']' returned non-zero exit status 2.

The related code is as follows.

subprocess.check_output(['scrapy', 'crawl', spider_name, '-a', f'file={filename}'])

Why this happens although that works well when runs as shell? Please help me.

here is the error log(journalctl -u xxx.service)

Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Serving Flask app "scraper_app" (lazy loading)
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Environment: production
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:    WARNING: This is a development server. Do not use it in a production deployment.
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:    Use a production WSGI server instead.
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Debug mode: on
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Running on http://216.158.230.201:5000/ (Press CTRL+C to quit)
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Restarting with stat
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Debugger is active!
Nov 17 10:17:09 ubuntu18.is.cc env[9486]:  * Debugger PIN: 446-889-261
Nov 17 10:17:13 ubuntu18.is.cc env[9486]: 3.83.20.243 - - [17/Nov/2019 10:17:13] "GET /upload HTTP/1.1" 200 -
Nov 17 10:17:22 ubuntu18.is.cc env[9486]: 3.83.20.243 - - [17/Nov/2019 10:17:22] "POST /upload HTTP/1.1" 302 -
Nov 17 10:17:23 ubuntu18.is.cc env[9486]: 3.83.20.243 - - [17/Nov/2019 10:17:23] "GET /run?file=uploads%2Fbroward_err.csv&spider=bcpa&county=broward&sunbiz_address= HTTP/1.1" 500 -
Nov 17 10:17:23 ubuntu18.is.cc env[9486]: Traceback (most recent call last):
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2463, in __call__
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     return self.wsgi_app(environ, start_response)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2449, in wsgi_app
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     response = self.handle_exception(e)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1866, in handle_exception
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     reraise(exc_type, exc_value, tb)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     raise value
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     response = self.full_dispatch_request()
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     rv = self.handle_user_exception(e)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     reraise(exc_type, exc_value, tb)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     raise value
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     rv = self.dispatch_request()
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     return self.view_functions[rule.endpoint](**req.view_args)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/root/LLC-lookup-scrapy/scraper_app.py", line 132, in run_spider
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     subprocess.check_output(['scrapy', 'crawl', spider_name, '-a', f'file={filename}'])
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     **kwargs).stdout
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:   File "/usr/lib/python3.6/subprocess.py", line 438, in run
Nov 17 10:17:23 ubuntu18.is.cc env[9486]:     output=stdout, stderr=stderr)
Nov 17 10:17:23 ubuntu18.is.cc env[9486]: subprocess.CalledProcessError: Command '['scrapy', 'crawl', 'bcpa', '-a', 'file=uploads/broward_err.csv']' returned non-zero exit status 2.
thinkmore
  • 229
  • 5
  • 15
  • You'll probably need to capture the output of the `scrapy crawl...` command to figure out what's going. Possibly if you look at the logs for your service (`journalctl -u scrapy.service`) you'll see something. – larsks Nov 17 '19 at 17:11
  • I've attached log from ```journalctl -u xxx.service)```. – thinkmore Nov 17 '19 at 19:10
  • @JingzhunLi: That’s not output from `scrapy` itself, though, but from your script that was already reporting the error. – Davis Herring Nov 17 '19 at 20:06
  • @DavisHerring, so what's your opinion? – thinkmore Nov 18 '19 at 01:45
  • @JingzhunLi: Uh… that you should provide more information about the process that “actually” failed if you want an answer? – Davis Herring Nov 18 '19 at 01:52
  • yes, i also want to see the result from scrapy spider, but i don't know how to get the log from scrapy spider function call. if you know, plz tell me the way. – thinkmore Nov 18 '19 at 03:02
  • https://stackoverflow.com/a/24850026/939364 – Gallaecio Nov 18 '19 at 11:19
  • in script, i use both `print(e.output)` and `logging.info(e.output)`, but I can't find the log string. Only main message python leaves is logged and other output string is not logged. Where can i find the entire log that i leave in my script? – thinkmore Nov 28 '19 at 01:55

0 Answers0