0

I am scraping a pool of URLs in a python/selenium setup using phantomJS and custom http headers.

Some specific URLs block the request and forward it to dontuseabot.css.

However, when I manually edit the capitalization of some header fields and replay/reissue the request using Fiddler, I can reach the URL of interest without seeing dontuseabot.css.

Since http headers are case sensitive, how to force uppercase/lowercase in phantomJS request header fields? Setting up a rule in Fiddler seems like an inefficient solution - similar to patching phantomJS.

sudonym
  • 3,788
  • 4
  • 36
  • 61
  • 1
    You could download PhantomJS sources, fix headers case and then build a binary, but using a proxy to correct headers is probably a simpler solution. – Vaviloff Dec 27 '17 at 04:39
  • It is hard to believe that noone did this already - is there something like an inventory list of availlable phantomJS patches? – sudonym Dec 27 '17 at 04:47
  • There are a dozen of pull requests: https://github.com/ariya/phantomjs/pulls Please note that PhantomJS is not actively developed anymore since there is [Puppeteer](https://github.com/GoogleChrome/puppeteer/) – Vaviloff Dec 27 '17 at 12:25

0 Answers0