6

I searched a lot on this, it may have a simple solution that I am missing.

I have setup scrapy + scrapyd on both my local machine and my server. They work both ok when I try as "scrapyd".

I can deploy to local without a problem, and I can access to localhost:6800 as well from the browser and I can run spiders on local.

After running scrapyd on remote, I try to deploy to http://remoteip:6800/ with the same as I did deploy locally,

I get,

Packing version 1500333306
Deploying to project "projectX" in http://remoteip:6800/addversion.json
Deploy failed: <urlopen error [Errno 111] Connection refused>

I also can't access http://remoteip:6800/ from my local PC, but I can access from ssh on remote PC (with curl)

I opened inbound and outbound connections on the remote server, what else I am missing?

Thanks

Mehmet Kurtipek
  • 415
  • 6
  • 13

2 Answers2

13

First check if its running or not, run curl localhost:6800 on the server where ScrapyD is running

Check if firewall is enabled

sudo ufw status

Ideally, just allow tcp connections to 6800instead of disabling firewall, to do so

sudo ufw allow 6800/tcp
sudo ufw reload

Check your scrapyd.conf please set

bind_address=0.0.0.0

instead of

bind_address=127.x.x.x

0.0.0.0 will make scrapyD accessible for incoming connections outside the server/instance, not only localhost.

Then stop scrapyD, I do killall scrapyd to stop scrapyd

Then restart scrapyD using command scrapyd


Note: If you want to keep scrapyd running even after you disconnect from server, do this

nohup scrapyd >& /dev/null &

Also see my answer to set ScrapyD as a System Service

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
  • 3
    spent the last 8 hours on this and bind_address=0.0.0.0 was the answer. Thanks! – Jeff Borden Mar 01 '18 at 20:27
  • 1
    This is especially helpful for Docker deployments. I tried everything to get scrapyd to serve its admin page inside a Docker container while it worked fine on my host system. Creating a config file and using bind_address=0.0.0.0 solved it. Here's an example config setting: https://scrapyd.readthedocs.io/en/stable/config.html – Ben Wilson Apr 14 '20 at 05:35
0

I know this answer may be late, but I hope it can help others like me.

From the official documentation, it will search the config file in these places:

  • /etc/scrapyd/scrapyd.conf (Unix)
  • c:\scrapyd\scrapyd.conf (Windows)
  • /etc/scrapyd/conf.d/* (in alphabetical order, Unix) scrapyd.conf
  • ~/.scrapyd.conf (users home directory)

So you need to create a scrapyd.conf file, and put some configurations in it.

Here is an example configuration file with all the defaults from the the documentation:

[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
items_dir   =
jobs_to_keep = 5
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 127.0.0.1
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application
launcher    = scrapyd.launcher.Launcher
webroot     = scrapyd.website.Root

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json  = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

And what you need to do is: change the bind_address to 0.0.0.0

Max Peng
  • 2,879
  • 1
  • 26
  • 43