1

I'm sorry for repost, the title in my earlier post was confusing. In the spider example (code below), how can I use "pyinstaller" (or some other installer) to build an executable (such as myspidy.exe) so the end user doesn't need to install scrapy and python in a windows environment? With Python and Scrapy installed, the spider is run by executing the command "scrapy crawl quotes". The end user would run download and run “myspidy.exe” in a Windows pc that doesn’t have Python and Scrapy preinstalled. Thanks so much!

import scrapy
class QuotesSpider(scrapy.Spider): 
    name = "quotes"
    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
           f.write(response.body)
        self.log('Saved file %s' % filename)

Thank you EVHZ. I made change to the code as you suggested and got the below errors during run time.

D:\craftyspider\spidy\spidy\spiders\dist>.\runspidy
Traceback (most recent call last):
File "spidy\spiders\runspidy.py", line 35, in <module>
File "site-packages\scrapy\crawler.py", line 249, in __init__
File "site-packages\scrapy\crawler.py", line 137, in __init__
File "site-packages\scrapy\crawler.py", line 326, in _get_spider_loader
File "site-packages\scrapy\utils\misc.py", line 44, in load_object
File "importlib\__init__.py", line 126, in import_module
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'scrapy.spiderloader'
[14128] Failed to execute script runspidy
webbeing
  • 179
  • 1
  • 1
  • 7
  • 1
    Possible duplicate of [How to make a Python script standalone executable to run without ANY dependency?](https://stackoverflow.com/questions/5458048/how-to-make-a-python-script-standalone-executable-to-run-without-any-dependency) – André Roggeri Campos May 15 '18 at 23:30

1 Answers1

3

In order to save everything in a python file, executable just by:

python script.py

You can use the code you have, and add few things:

import scrapy
from scrapy.crawler import CrawlerProcess

from scrapy.utils.project import get_project_settings
# useful if you have settings.py 
settings = get_project_settings()

# Your code
class QuotesSpider(scrapy.Spider): 
  name = "quotes"
  def start_requests(self):
    ...    

# Create a process
process = CrawlerProcess( settings )
process.crawl(QuotesSpider)
process.start()

Save that as script.py. Then, using pyinstaller:

pyinstaller --onefile script.py

Will generate the bundle in a subdirectory called dist.

Evhz
  • 8,852
  • 9
  • 51
  • 69
  • Hello, thank you. I tried the above and got these errors during as below. – webbeing May 16 '18 at 03:14
  • I'm sorry, the error during run time is as shown in the original question above. – webbeing May 16 '18 at 03:24
  • I used --hidden-import to resolve the missing module "scrapy.spiderloader" which led to more missing modules all started with "scrapy.*...". Thanks for the help. – webbeing May 16 '18 at 03:31
  • I'm sorry the above comment implies I got it. I haven't yet. I added around 50 --hidden-import to resolve missing modules. Now I got "no such file ... mime.types". – webbeing May 16 '18 at 17:01
  • regarding `ModuleNotFoundError: No module named 'scrapy.spiderloader'` closely review that the file works before `pyinstaller`, then maybe the module is hidden during the build? It's hard to say with no more details about the project structure, the code, the --hidden-imports you used. But look closely, if the script works with `python` then it's a matter in the build process. You'll get it, or if not, create another question regarding the build. Still, trace the build, it must be there. – Evhz May 16 '18 at 22:31