I am new to Scrapy and Python and I am enjoying it.
Is it possible to debug a scrapy project using Visual Studio? If it is possible, how?
I am new to Scrapy and Python and I am enjoying it.
Is it possible to debug a scrapy project using Visual Studio? If it is possible, how?
I've created a init file named runner.py
from scrapy.cmdline import execute
execute(['scrapy','crawl', 'spider_name'])
you just need to set that file as the startup in the project options.
it works with visual studio 2015
You can install PTVS in visual studio 2012. Then create a python project from existing python code, and import your code.
If you are familiar with Visual Studio, it's the same as other languages in Visual Studio, like C++/C#. Just create some break points and start your script with Debugging.
As ThanhNienDiCho said, add "-mscrapy.cmdline crawl your_spider_name" to your interpreter argument.
Well, I tried all of the answers given to the OP and none worked for me. The closest of all seems to be the one posted by @Rafal Zajac , however it also failed for me.
I ended up finding the solution in here, however in there also some answers are no longer working in new versions.
So the version that seems to work for me is this:
from scrapy.crawler import CrawlerProcess
from tutorial.spiders.dmoz_spider import DmozSpider
import scrapy.utils.project
from sys import stdin
print ("init...")
spider = DmozSpider()
setttings = scrapy.utils.project.get_project_settings()
process = CrawlerProcess(setttings)
process.crawl(spider)
process.start()
x = stdin.read(1)
This should be in the startup script, no script arguments are required.
I had the same problem, and Yuan's initial answer didn't work for me.
To run Scrapy, you need to open cmd.exe
and
cd "project directory"
scrapy crawl namespider
-mscrapy.cmdline crawl your_spider_name
. See https://i.stack.imgur.com/KiPUc.jpg.UPDATE:
It looks like with version 1.1 of scrapy you have to change the "Script Arguments" in your project debug settings to "runspider <spider file name>.py" and it should work as expected:
I'm new to python and scrapy too and I think I had exactly the same problem.
I was following a tutorial from Scrapy's website: http://doc.scrapy.org/en/latest/intro/tutorial.html, so first I generated the file structure for the scrapy project "tutorial".
Next step was to create new python project "From existing python code" and select the top folder "tutorial". When the wizard asks which file types to import I'd just use *.* to import everything. If you leave the default settings it won't import file scrapy.cfg.
I guess you got this far and what you just wanted was to put a breakpoint e.g. in the spider class, hit F5 and start debugging?
I tried as suggested:
As ThanhNienDiCho said, add "-mscrapy.cmdline crawl your_spider_name" to your interpreter argument.
In this case you also have to set the startup file - I couldn't figure out this part. You can't use any files from the project because that's not how it works, right? I tried adding dummy.py (empty file) on the top level as a startup file but then I was getting a message from Scrapy that "unknown command: crawl" - just the message you would get if you run command "scrapy" but not from the project folder. Maybe there is a way to make it work and someone could explain the full setup using this approach? I couldn't get it right.
Finally I noticed that the linux equivalent of scrapy.bat is a python file with following content:
from scrapy.cmdline import execute
execute()
So I replaced my dummy.py with file scrapy_runner.py (the file name doesn't matter) with the above content - and that was my startup file.
Now the last thing was to add to the Project Properties -> Debug -> Script Argument following value:
crawl dmoz
where "dmoz" was the name of the name of the spider from the tutorial.
This setup works for me. I hope this helps.