2

As of recently Scrapinghub no longer has periodic jobs in their free package, which is what I used to use to run my Scrapy crawlers.

Therefore, I decided to use Scrapyd instead. So I went ahead and got a virtual server running Ubuntu 16.04. (This is my first time setting up and running a server, so please bear with me)

Following the instructions on scrapyd.readthedocs.io I installed Scrapyd using pip:

$ pip install scrapyd

(That was after I figured out that the recommended way for Ubuntu, using apt-get, is actually no longer supported, see Github).

Then I log onto my server using SSH, and run Scrapyd by simply running

$ scrapyd

Everything looks fine as far as I can tell:

2017-10-30 17:31:19+0000 [-] Log opened.
2017-10-30 17:31:19+0000 [-] twistd 16.0.0 (/usr/bin/python 2.7.12) starting up.
2017-10-30 17:31:19+0000 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2017-10-30 17:31:19+0000 [-] Site starting on 6800
2017-10-30 17:31:19+0000 [-] Starting factory <twisted.web.server.Site instance at 0x7f644752bfc8>
2017-10-30 17:31:19+0000 [Launcher] Scrapyd 1.2.0 started: max_proc=4, runner=u'scrapyd.runner'

I would expect to see a web interface (described here) when I go to my IP at http://82.165.102.18:6800.

Instead, I just get the error message "This site can’t be reached 82.165.102.18 refused to connect."

When I try to run Scrapyd locally, everything works just fine, and I get the web interface at http://localhost:6800/.

I have tried disabling the Firewall (UFW), but that didn't help.

At this point, I am lost. If you have any ideas, please let me know!

Thanks a lot!

Sebastian
  • 831
  • 2
  • 13
  • 36

1 Answers1

5

If you can reach your Scrapyd instance locally but not over network, I suspect Scrapyd listens only on localhost. Be sure to have this line in your scrapyd.conf:

bind_address = 0.0.0.0

It instructs Scrapyd to listen on all interfaces. bind_address defaults to 127.0.0.1, so by default it only listens on localhost.

Tomáš Linhart
  • 9,832
  • 1
  • 27
  • 39
  • Thanks a million, Tomas, that was it! – Sebastian Nov 01 '17 at 18:09
  • 1
    By the way: I had to create the file scrapyd.conf, as it didn't exist before. I created the file taking [this example file](http://scrapyd.readthedocs.io/en/stable/config.html#example-configuration-file) as a template and changed the bind_address as you suggested. – Sebastian Nov 01 '17 at 18:13
  • Also, I didn't mention it the last time, but consider running Scrapyd in Docker. It greatly simplifies things, especially if deploy in scale. – Tomáš Linhart Nov 01 '17 at 18:52
  • Thanks again for your help, Tomas. Do you think you could maybe have a look at my follow-up question [over here](https://stackoverflow.com/questions/47065225/preferred-way-to-run-scrapyd-in-the-background-as-a-service)? – Sebastian Nov 01 '17 at 23:29
  • I had the same problem on Vultr, and I had to open the port with ufw for it to work. ```bash ufw allow 6800 ``` – Viktor Andersen Jul 08 '23 at 10:36