Questions tagged [scrapy-shell]

The Scrapy shell is an interactive shell where you can try and debug your scraping code very quickly, without having to run the spider.

It’s meant to be used for testing data extraction code, but you can actually use it for testing any kind of code as it is also a regular Python shell.

177 questions
27
votes
3 answers

Scrapy Shell and Scrapy Splash

We've been using scrapy-splash middleware to pass the scraped HTML source through the Splash javascript engine running inside a docker container. If we want to use Splash in the spider, we configure several required project settings and yield a…
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
20
votes
1 answer

Set headers for scrapy shell request

I know that you can scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com' to change the USER_AGENT, but how do you add request headers?
Computer's Guy
  • 5,122
  • 8
  • 54
  • 74
11
votes
3 answers

Why am I getting this error in scrapy - python3.7 invalid syntax

I've had a heck of a time installing scrapy. I have it installed on my mac but I am getting this error when running the tutorial: Virtualenvs/scrapy_env/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154 def write(self, data,…
user3408397
  • 523
  • 1
  • 5
  • 14
11
votes
2 answers

How to disable robots.txt when you launch scrapy shell?

I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site. How can I disable robots detection by Scrapy (ignored the existence)? Thank you in advance. I'm not talking…
DARDAR SAAD
  • 392
  • 1
  • 3
  • 17
10
votes
2 answers

How can use scrapy shell with url and basic auth credentials?

I want to use scrapy shell and test response data for url which requires basic auth credentials. I tried to check scrapy shell documentation but I couldn't find it there. I tried with scrapy shell 'http://user:pwd@abc.com' but it didn't work. Does…
Rohanil
  • 1,717
  • 5
  • 22
  • 47
9
votes
3 answers

Scrapy shell against a local file

Before Scrapy 1.0, I could've run the Scrapy Shell against a local file quite simply: $ scrapy shell index.html After upgrading to 1.0.3, it started to throw an error: $ scrapy shell index.html 2015-10-12 15:32:59 [scrapy] INFO: Scrapy 1.0.3…
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
6
votes
1 answer

Scrapy shell return without response

I have a little problem with scrapy to crawl a website. I followed the tutorial of scrapy to learn how crawl a website and I was interested to test it on the site 'https://www.leboncoin.fr' but the spider doesn't work. So, I tried : scrapy shell…
Chris PERE
  • 722
  • 7
  • 13
4
votes
1 answer

Scrapy - 301 redirect in shell

I can not find a solution to the following problem. I am using Scrapy (latest version) and am trying to debug a spider. Using scrapy shell https://jigsaw.w3.org/HTTP/300/301.html -> it does not follow the redirect ( it is using a default spider to…
Pixelartist
  • 378
  • 5
  • 17
3
votes
1 answer

How can I use scrapy middleware in the scrapy Shell?

In a scrapy project one uses middleware quite often. Is there a generic way of enableing usage of middleware in the scrapy shell during interactive sessions as well?
thinwybk
  • 4,193
  • 2
  • 40
  • 76
3
votes
2 answers

Scrapy: why I can't extract my targeted data from weather underground?

I am new to Python and web scraping and this is my first ever question on stackoverflow. I watched several tutorials and then I tried to extract data from the table on this page: https://www.wunderground.com/hourly/ir/tehran/date/2021-04-14. The…
Neil
  • 49
  • 6
3
votes
1 answer

scrapy downloads the html page but could get data using xpaths or css

I am trying scrape this page, when I do scrapy shell "https://redsea.com/en/apple-iphone-x-64gb-silver.html", it downloads the html page and I can view the downloaded html with view(response) in the browser: But when I try to get any data -product…
Javed
  • 5,904
  • 4
  • 46
  • 71
3
votes
1 answer

Scrapy Error: 'NotSupported: Unsupported URL scheme '': no handler available for that scheme'

I am trying to scrap a site but while running the script, I'm getting following error 'NotSupported: Unsupported URL scheme '': no handler available for that scheme' If the rule is not wrong, why does it occur and what's your suggestion, please…
Samsul Islam
  • 2,581
  • 2
  • 17
  • 23
3
votes
1 answer

python convert chinese characters in url

I have a url like href="../job/jobarea.asp?C_jobtype=經營管理主管&peoplenumber=151", this is shown in inspect element. But when opened in new tab it is showing as ../job/jobarea.asp?C_jobtype=%B8g%C0%E7%BA%DE%B2z%A5D%BA%DE&peoplenumber=151 How do I…
Dev Pandu
  • 121
  • 2
  • 12
2
votes
0 answers

I cant open scrapy shell at anaconda shell

I was trying to start the scrapy shell on anaconda But this Error occured multiple times; [scrapy.core.downloader.handlers] ERROR: Loading "scrapy.core.downloader.handlers.http.HTTPDownloadHandler" for scheme "http" Traceback (most recent call…
1
2 3
11 12