3

I'm having some problems with using Scrapy-Splash together with an HTTP proxy (see "500 Internal Server Error" when combining Scrapy over Splash with an HTTP proxy), even when I try to set a proxy profile following http://splash.readthedocs.io/en/latest/api.html#proxy-profiles.

In order to understand better what is going on, I was looking for the part of the Scrapy-Splash source code, https://github.com/scrapy-plugins/scrapy-splash, which parses the proxy host and port specified in the .ini file in /etc/splash/proxy-profiles.

However, searches for "proxy" or ".ini" in the repository didn't yield any results. Can someone explain to me how the proxy profiling is implemented in Scrapy-Splash?

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • 1
    Sorry, what's the problem exactly? Have you read http://splash.readthedocs.io/en/stable/api.html#proxy-profiles, does it work for you? – Mikhail Korobov Jul 20 '17 at 20:34
  • @MikhailKorobov The world would be a nicer place if anyone gave a concrete example of how to use scrapysplash with proxy profiles. What should be in the directory? what should be in the parameter passed to the ScrapySplash args ... not everyone are docker experts either. – rubmz Jul 16 '21 at 13:03

1 Answers1

1

First, the Scrapy-Splash proxy setting is in /etc/splash/proxy-profiles, but if you are running splash in a container, you can map the host proxy profile to the container by -v, eg:

sudo docker run -p 8050:8050 -v /etc/splash/proxy-profiles:/etc/splash/proxy-profiles scrapinghub/splash

Second, when visiting the url through splash, a proxy parameter is need if proxy profile name is not default.ini, eg:

localhost:8050/render.html?url=http://target.com?wait=1&timeout=2&proxy=filename
Tom Aranda
  • 5,919
  • 11
  • 35
  • 51
Leon
  • 21
  • 4