2

Installed docker, scrapyjs, and splash per alexce's instructions here.

Then, running docker run -p 8050:8050 scrapinghub/splash, I get this output:

2016-05-08 17:17:45+0000 [-] Log opened.
2016-05-08 17:17:45.978866 [-] Splash version: 2.1
2016-05-08 17:17:45.979553 [-] Qt 5.5.1, PyQt 5.5.1, WebKit 538.1, sip 4.17, Twisted 16.1.1, Lua 5.2
2016-05-08 17:17:45.980138 [-] Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4]
2016-05-08 17:17:45.980401 [-] Open files limit: 1048576
2016-05-08 17:17:45.981020 [-] Can't bump open files limit
2016-05-08 17:17:46.086232 [-] Xvfb is started: ['Xvfb', ':1', '-screen', '0', '1024x768x24']
2016-05-08 17:17:46.161902 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2016-05-08 17:17:46.260357 [-] verbosity=1
2016-05-08 17:17:46.260607 [-] slots=50
2016-05-08 17:17:46.261170 [-] argument_cache_max_entries=500
2016-05-08 17:17:46.262476 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2016-05-08 17:17:46.264565 [-] Site starting on 8050
2016-05-08 17:17:46.265203 [-] Starting factory <twisted.web.server.Site object at 0x7f270ec81e10>

And it hangs up there. I tried troubleshooting based on instructions here (reinstalled docker, verify vm is running, regenerate certs, set env) but still nothing.

My settings file and Spider

When running the spider without the meta= parameter in the yield scrapy.Request' call withinstart_requests`, the spider runs fine (except that the dynamic fields are not captured), so I'm not sure if this is an issue with docker or splash.

thanks in advance.

Community
  • 1
  • 1
Benjamin James
  • 941
  • 1
  • 9
  • 24

1 Answers1

2

When running Splash with Docker, the console will, at first, just "hang there", indeed. Splash is waiting for a request through one of its endpoints.

$ sudo docker run -p 8050:8050 scrapinghub/splash
2016-05-09 10:21:42+0000 [-] Log opened.
2016-05-09 10:21:42.773541 [-] Splash version: 2.1
2016-05-09 10:21:42.774298 [-] Qt 5.5.1, PyQt 5.5.1, WebKit 538.1, sip 4.17, Twisted 16.1.1, Lua 5.2
2016-05-09 10:21:42.774453 [-] Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4]
2016-05-09 10:21:42.774632 [-] Open files limit: 1048576
2016-05-09 10:21:42.774842 [-] Can't bump open files limit
2016-05-09 10:21:42.879868 [-] Xvfb is started: ['Xvfb', ':1', '-screen', '0', '1024x768x24']
2016-05-09 10:21:43.072351 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2016-05-09 10:21:43.214478 [-] verbosity=1
2016-05-09 10:21:43.214617 [-] slots=50
2016-05-09 10:21:43.214703 [-] argument_cache_max_entries=500
2016-05-09 10:21:43.215195 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2016-05-09 10:21:43.217494 [-] Site starting on 8050
2016-05-09 10:21:43.217635 [-] Starting factory <twisted.web.server.Site object at 0x7f529d0fee48>

To test if Splash is running correctly, try its web UI at http://localhost:8050/

You should see something like:

enter image description here

And you can then try entering some URL and clicking "Render me!"

paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
  • Thanks again- I have gotten that far, but when I go to localhost it says it can't be reached-localhost refused to connect. Any tips? – Benjamin James May 09 '16 at 20:39
  • As I said, it may be easier if I could see `settings.py` – Benjamin James May 09 '16 at 20:53
  • `settings.py` is not relevant for Splash server. I don't know what is causing the refused connection to localhost on port 8050. Do you have something else running on that port? – paul trmbrth May 09 '16 at 21:17
  • no, i did `lsof -i | grep LISTEN` but 8050 was not listed as being used. Since I'm on a mac, our sys admin says I may need to double expose the ports (on the vm and my machine).. any advice on that? sorry to pester but I noticed you work for scrapinghub. – Benjamin James May 10 '16 at 14:56