2

I'm having a bit of trouble. I want to run a shell command within Python in a specific directory. Based on the code I found on the internet I need the following inclusions:

import os
import subprocess
import shlex

And then the code itself is below

os.chdir('/etc/test/')
cmd = 'scrapy crawl test'
subprocess.call(shlex.split(cmd))

As it looks like, I am trying to run the command "scrapy crawl test" within the /etc/test/ directory. When I run this manually with terminal it seems to work fine however when I run it with this python code it gives me an error:

INFO Exception occured while scraping: [Errno 2] No such file or directory

Is anyone able to tell me if my code is incorrect, or if I am going about this the wrong way perhaps.

Jimmy
  • 12,087
  • 28
  • 102
  • 192
  • Is there any additional traceback information, or just that one-line error? – abarnert Aug 19 '13 at 20:36
  • 1
    As a side note, `cmd = ['scrapy', 'crawl', 'test']` then `subprocess.call(cmd)` is simpler, and probably harder to get wrong; no need to use `shlex` here. But that' won't affect the problem you're trying to solve. – abarnert Aug 19 '13 at 20:37
  • @abamert I can't find any further traceback information I'm afraid. Would I still need the os.chdir command as well in your case? – Jimmy Aug 19 '13 at 20:40
  • You should not have the final '/' after test. Also, since you don't give an absolute path, scrapy needs to be in a directory in PATH. – stark Aug 19 '13 at 20:40
  • @Jimmy: Yes. (Well, there are _other_ ways you could get the same benefit… but the point is that passing a list instead of using `shlex` to get the exact same list won't affect anything.) – abarnert Aug 19 '13 at 20:41
  • @stark: The final `/` doesn't hurt anything. And obviously `scrapy` is on the PATH because he's getting an error message from `scrapy`. – abarnert Aug 19 '13 at 20:41
  • Use `subprocess.check_output(shlex.split(cmd))`. What does it say? – hek2mgl Aug 19 '13 at 20:43
  • @Jimmy: Speaking of other ways you could get the same benefit: First, you can try passing an absolute path for `test` (`/etc/test/test`) instead of doing a `chdir`. But, more simply… why are you writing a Python script to run the `scrapy` command-line helper in the first place? Why not just use `scrapy` from within Python? – abarnert Aug 19 '13 at 20:43
  • Running scrapy from python seems fairly complicated, however running it from the command line is really straight forward so I was thinking of going that route with it – Jimmy Aug 19 '13 at 20:53
  • 1
    @Jimmy: `scrapy` is a Python library. Have you gone through the Getting Started and Tutorial stuff? Everything you want to do from within Python, you can do in Python. Everything you want to do at the shell or in a cronjob or whatever, you can do with the command line tool. If you're trying to run the command-line tool from within Python, you're probably making a mistake earlier on in the process… but it's hard to be sure what that is without more information on what you're trying to do. – abarnert Aug 19 '13 at 20:54
  • 1
    @Jimmy: At any rate, the error you're seeing is coming from `scrapy`, not from your code. That could mean there's a bug in your spider, or your directory layout isn't what you expect, or a million other things. Have you tried using `scrapy shell` to debug it, as described in the tutorial? – abarnert Aug 19 '13 at 20:55
  • to run a subprocess in a different directory; you could use `cwd` argument: `subprocess.check_call(shlex.split(cmd), cwd='/etc/test/')` (remove `os.chdir` in this case). – jfs Aug 20 '13 at 08:33

1 Answers1

3

Why are you using subprocess? A common practice to run Scrapy from a script is to use twisted's reactor. Taken from docs:

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from testspiders.spiders.followall import FollowAllSpider

spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here

There is plenty of examples out there:

Hope that helps.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195