4

I am new to Scrapy and Python and I am enjoying it.

Is it possible to debug a scrapy project using Visual Studio? If it is possible, how?

user3860415
  • 41
  • 1
  • 3

5 Answers5

4

I've created a init file named runner.py

from scrapy.cmdline import execute
execute(['scrapy','crawl', 'spider_name'])

you just need to set that file as the startup in the project options.

it works with visual studio 2015

pedrommuller
  • 15,741
  • 10
  • 76
  • 126
2

You can install PTVS in visual studio 2012. Then create a python project from existing python code, and import your code.

If you are familiar with Visual Studio, it's the same as other languages in Visual Studio, like C++/C#. Just create some break points and start your script with Debugging.

As ThanhNienDiCho said, add "-mscrapy.cmdline crawl your_spider_name" to your interpreter argument.

PTVS screenshot

Yuan
  • 1,147
  • 11
  • 16
  • Thank you Yuvan, I have managed to debug Python previously. What I was looking for was to debug scrapy using VS. However, I have found this link http://pytools.codeplex.com/wikipage?title=Features%20Debugging#launch-modes Which elaborates debug options using VS. – user3860415 Jul 21 '14 at 12:12
  • I don't think scrapy has any difference with a normal python project. All normal python script files. What you mentioned is the same as PTVS in my answer. – Yuan Jul 21 '14 at 12:23
1

Well, I tried all of the answers given to the OP and none worked for me. The closest of all seems to be the one posted by @Rafal Zajac , however it also failed for me.

I ended up finding the solution in here, however in there also some answers are no longer working in new versions.

So the version that seems to work for me is this:

from scrapy.crawler import CrawlerProcess
from tutorial.spiders.dmoz_spider import DmozSpider
import scrapy.utils.project 
from sys import stdin

print ("init...")
spider = DmozSpider()
setttings = scrapy.utils.project.get_project_settings()
process = CrawlerProcess(setttings)
process.crawl(spider)
process.start()
x = stdin.read(1)

This should be in the startup script, no script arguments are required.

Community
  • 1
  • 1
omer schleifer
  • 3,897
  • 5
  • 31
  • 42
  • Thanks for pointing out that my solution doesn't work anymore. I'm back on Scrapy so I had to fix the debugging in VS (again). It looks like there's only a small difference to what I originally suggested. I've updated my answer... – Rafal Zajac Sep 13 '16 at 01:03
0

I had the same problem, and Yuan's initial answer didn't work for me.

To run Scrapy, you need to open cmd.exe and

cd "project directory"
scrapy crawl namespider
  • scrapy is scrapy.bat.
  • namespider is the value of the field in spider class.
  • To run Scrapy from Visual Studio, use input parameters of -mscrapy.cmdline crawl your_spider_name. See https://i.stack.imgur.com/KiPUc.jpg.
Community
  • 1
  • 1
0

UPDATE:

It looks like with version 1.1 of scrapy you have to change the "Script Arguments" in your project debug settings to "runspider <spider file name>.py" and it should work as expected:

enter image description here


I'm new to python and scrapy too and I think I had exactly the same problem.

I was following a tutorial from Scrapy's website: http://doc.scrapy.org/en/latest/intro/tutorial.html, so first I generated the file structure for the scrapy project "tutorial".

Next step was to create new python project "From existing python code" and select the top folder "tutorial". When the wizard asks which file types to import I'd just use *.* to import everything. If you leave the default settings it won't import file scrapy.cfg.

I guess you got this far and what you just wanted was to put a breakpoint e.g. in the spider class, hit F5 and start debugging?

I tried as suggested:

As ThanhNienDiCho said, add "-mscrapy.cmdline crawl your_spider_name" to your interpreter argument.

In this case you also have to set the startup file - I couldn't figure out this part. You can't use any files from the project because that's not how it works, right? I tried adding dummy.py (empty file) on the top level as a startup file but then I was getting a message from Scrapy that "unknown command: crawl" - just the message you would get if you run command "scrapy" but not from the project folder. Maybe there is a way to make it work and someone could explain the full setup using this approach? I couldn't get it right.

Finally I noticed that the linux equivalent of scrapy.bat is a python file with following content:

from scrapy.cmdline import execute
execute()

So I replaced my dummy.py with file scrapy_runner.py (the file name doesn't matter) with the above content - and that was my startup file.

Now the last thing was to add to the Project Properties -> Debug -> Script Argument following value:

crawl dmoz

where "dmoz" was the name of the name of the spider from the tutorial.

This setup works for me. I hope this helps.

enter image description here

Rafal Zajac
  • 1,613
  • 1
  • 16
  • 13
  • why pass "crawl dmoz" as arguments? I get the error can't open file: "crawl" when passing mscrapy.cmdline crawl dmoz I get the "unknown command: crawl" error – omer schleifer May 04 '16 at 08:32
  • I created my comment 2 years ago and never worked with scrapy since. From what I remember you need to pass "crawl dmoz" so that the resulting command executed by Visual Studio when debugging is: "python scrapy_runner.py crawl dmoz". The parameters "crawl" and ""dmoz" are then used when the function "execute()" from the file scrapy_runner.py is executed. – Rafal Zajac May 05 '16 at 10:57