3

I have a crawler which I want to run everytime a person goes to the link. Since all the other modules are in Flask, I was told to build this in Flask also. I have installed scrapy and selenium both in the virtual environment and globally on the machine with root.

When I run the crawler through the terminal, everything works fine. When I start the Flask application and visit xx.xx.xx.xx:8080/whats in the browser, this also works fine and runs my crawler and gets me the file. But as soon as I go live so that anytime a person goes to the link, it gives me internal error in browser.

In order to run crawler, we have to type "scrapy crawl whateverthespidernameis" in the terminal. I did this using Python's os module.

Here is my flask code:

import sys
from flask import request, jsonify, render_template, url_for, redirect,   session, abort,render_template_string,send_file,send_from_directory
from flask import *
#from application1 import *
from main import *
from test123 import *
import os
app = Flask(__name__)

filename = ''
app = Flask(__name__)

@app.route('/whats')
def whats():
    os.getcwd()
    os.chdir("/var/www/myapp/whats")
    //cmd = "scrapy crawl whats"
    cmd = "sudo scrapy crawl whats"
    os.system(cmd)
    return send_file("/var/www/myapp/staticcsv/whats.csv", as_attachment =True)

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080,debug=True)

This is the error recorded in the log file when I run through live link:

sh: 1: scrapy: not found**

This is the error recorded in the log file when I use sudo in the command (variable cmd):

sudo: no tty present and no askpass program specified**

I am using uwsgi and nginx.

How can I run this crawler so that when anyone goes to "xx.xx.xx.xx/whats" the crawler runs and returns the csv file?

Arpit Agarwal
  • 517
  • 1
  • 5
  • 17
  • 1
    Does your code really have the line `cmd = "sudo crapy crawl whats"` (note the invocation of a program named `crapy`)? – jpmc26 Aug 17 '15 at 07:22
  • you are using the python os command ' cmd = "sudo crapy crawl whats" '. I think it should be ' cmd = "sudo scrapy crawl whats" '. And the next point is that check your scrapy PATH, correct it. It will run without any error. :) – Vasim Aug 17 '15 at 07:24
  • 1
    Thanks for pointing out the error in post, I have edited that – Arpit Agarwal Aug 17 '15 at 07:32
  • I answered similar question here: https://stackoverflow.com/questions/36384286/how-to-integrate-flask-scrapy – Pawel Miech May 17 '16 at 08:31

1 Answers1

1

When you use sudo the shell this starts will ask for a password on the tty - it specifically doesn't read standard input for this information. Since flask and other web applications typically run detached from a terminal, sudo has no way to ask for a password, so it looks for a program that can provide the password. You can find more information on this topic in this answer.

The reason you aren't finding scrapy is most likely because of differences in your $PATH between the interactive shells you used in testing and the process that's running flask. The easiest way to get around this is to give the full path to the scrapy program in your command.

Community
  • 1
  • 1
holdenweb
  • 33,305
  • 7
  • 57
  • 77
  • Note that one potential cause of this sort of situation is that your virtualenv is active during development but not activated for a deployed version. – jpmc26 Aug 17 '15 at 07:30
  • What do you mean by "shell this starts"? – Arpit Agarwal Aug 17 '15 at 07:34
  • I will try to give the full path and see if that works – Arpit Agarwal Aug 17 '15 at 07:35
  • @holdenweb Thank you it worked like a charm, writing the crawler and code took me hardly 3 to 4 hours. I was stuck in making it live since 2 days. I can't believe it didn't strike me to add full path. Besides that was a nice point that "flask and other web applications typically run detached from a terminal" – Arpit Agarwal Aug 17 '15 at 07:54
  • 1
    By "shell this starts" I meant the sub-shell in which the `sudo` command is run - a sub-shell must be used because it needs to have a different process environment. Glad you have solved your problem. – holdenweb Aug 17 '15 at 08:10