Scrapy Shell - How to change USER_AGENT

Question

I have a fully functioning scrapy script to extract data from a website. During setup, the target site banned me based on my USER_AGENT information. I subsequently added a RotateUserAgentMiddleware to rotate the USER_AGENT randomly. This works great.

However, now when I trying to use the scrapy shell to test xpath and css requests, I get a 403 error. I'm sure this is because the USER_AGENT of the scrapy shell is defaulting to some value the target site has blacklisted.

Question: is it possible to fetch a URL in the scrapy shell with a different USER_AGENT than the default?

fetch('http://www.test') [add something ?? to change USER_AGENT]

Thx

possible duplicate of [Scrapy Python Set up User Agent](http://stackoverflow.com/questions/18920930/scrapy-python-set-up-user-agent) — Sylvain Leroux, Aug 21 '14 at 15:23
different issue. I am able to change the USER_AGENT in settings.py no problem. I'm trying to change the setting under scrapy shell: http://doc.scrapy.org/en/latest/topics/shell.html — dfriestedt, Aug 21 '14 at 16:41

marven · Accepted Answer · 2014-08-22T03:24:39.167

61

scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com'

edited Aug 22 '14 at 03:24

answered Aug 22 '14 at 01:15

marven

1,836
1
17
14

Do you know how to also add headers to scrapy shell? Thanks. – Computer's Guy May 03 '16 at 17:17
4

I got here because I was running the shell from outside the project directory and my settings file was being ignored. Once I changed into the project directory, the custom `USER_AGENT` setting worked properly, no need to pass any extra parameter to the `scrapy shell` command. – Ariel Aug 13 '17 at 15:46

score 16 · Answer 2 · answered Oct 19 '16 at 15:57

16

Inside the scrapy shell, you can set the User-Agent in the request header.

url = 'http://www.example.com'
request = scrapy.Request(url, headers={'User-Agent': 'Mybot'})
fetch(request)

answered Oct 19 '16 at 15:57

salmanwahed

9,450
7
32
55

Scrapy Shell - How to change USER_AGENT

2 Answers2

Linked