Why i can't not use Scrapy to crawl from this URL (with "#" in it)?

Question

i am totally new to Scrapy, now i am working on a project, which i need to use Scrapy crawling from this website:https://www.google.com/partners/#a_search;bdgt=10000;lang=en;locn=United%20States;motv=0;wbst=http%253A%252F%252F
i can't pass the whole URL to response in Scrapy, so i used PYCHARM to debug it, i found that i can only pass the URL before #, can anybody help me to solve this problem? thanks a lot!!!!

hope [this](http://stackoverflow.com/questions/33395133/scrapy-google-crawl-doesnt-work/33395421#33395421) helps — eLRuLL, Nov 28 '16 at 20:09
i tried [link](https://www.google.com/partners/?a_search....)[link](https://www.google.com/partners/?search...)both of them doesn't work:( — jess1818, Nov 28 '16 at 21:29
Or try PhantomJS + Selenium inside Scrapy .... [look at my answer](http://stackoverflow.com/a/40833619/4094231) — Umair Ayub, Dec 01 '16 at 14:48

score 3 · Accepted Answer · answered Nov 28 '16 at 21:28

Url fragment (the part after #) is not sent to remote web servers; this is how HTTP works. Fragment is handled by a browser after request is sent; in case of Google it triggers some JavaScript functions, etc.

Scrapy is not a browser - it doesn't evaluate JavaScript; Scrapy just downloads data via HTTP. That's the reason fragment is stripped from URL when Scrapy fetches a page - there is no way to use it.

If you want to handle such URLs fragments you have two options:

emulate what is browser doing - inspect what HTTP requests it is making when you pass this URL and emulate them in Scrapy;
use a browser engine to render a page, e.g. Selenium, PhantomJS or Splash. There is a plugin for scrapy+splash integration: https://github.com/scrapy-plugins/scrapy-splash.

thank you so much, i think splash is what exact i am looking for — jess1818, Nov 29 '16 at 05:09

Why i can't not use Scrapy to crawl from this URL (with "#" in it)?

1 Answers1