1

I don't know if there's already a question on the same subject. My english is not good enough to understand all the topics I saw on stackoverflow.com about web scraping and run a spider from an exe file.

So, I'm sorry if I ask a question which was already answered somewhere but : is it possible, after having written my spider, to launch it from an exe file instead of launching it from the scrapy crawl xxx ? Just by clicking on the .exe file, the computer will search the items I want on the website I crawl, and will give me a .csv file or .json file. I saw py2exe but it seems to be for the output of my spider. I don't understand it.

I hope I've been clear enough (it's not even clear in French in my head, and it's really hard to translate it in English).

Thanks a lot for your help!!

P.Postrique
  • 135
  • 1
  • 3
  • 12
  • Hi, if it doesn't need to be .exe, you could just write a script that will do everything for you. You could call your python script from a Windows shell script, for example: https://www.csie.ntu.edu.tw/~r92092/ref/win32/win32scripting.html – Aleksander Lidtke Jul 31 '17 at 02:46
  • Yes, the problem is I want to give this program to people who know nothing about programming stuffs. I just want the easier way for them to use my spider. You think your solution is the easier? – P.Postrique Jul 31 '17 at 02:50
  • I don't know what's easier *in your specific case*. But you can definitely write a simple bat file that the user can just click. You can put `python3.5 yourPythonScript.py` and it'll execute your spider: https://stackoverflow.com/questions/4571244/creating-a-bat-file-for-python-script – Aleksander Lidtke Jul 31 '17 at 05:16

3 Answers3

2

To run a python script, you need to have a python interpreter available on the machine.

So if you want to distribute your python script (your spider), you need to make sure that your users have a correct python environment set up.

I

When you are dealing with technical people this is usually not a problem, just tell them they need to install python 3.5 (or whatever you are using) and get the required modules with pip. They might even figure it out on their own.

II

If you are dealing with non technical users, you don't want to make their live hard by requiring them to use the command-line or having to install all the dependencies.

Instead you can provide them with a self-contained package that includes: your script, the python interpreter and required additional modules.

There are several tools that can create these packages for you:

See also these questions asked on stackoverflow:

Community
  • 1
  • 1
juwi
  • 98
  • 7
  • You can just create your own module and specify the dependencies. Then, the users will be able to just `pip install` that one module and pip will install all the dependencies for them: https://python-packaging.readthedocs.io/en/latest/dependencies.html – Aleksander Lidtke Jul 31 '17 at 05:12
  • Thank you for your answer. However, I've already seen py2exe (which seems to be the solution to my problem) but I don't understand how it works. Could you explain it to me with simple words please? It seems to work for a simple python script but I don't know how it works for a scrapy one... – P.Postrique Jul 31 '17 at 06:32
  • Please try PyInstaller first, as it seams the easiest option. You can find the instructions on the website http://www.pyinstaller.org/ – juwi Jul 31 '17 at 06:46
  • Ok, with pyinstaller, a directory dist was created and a .exe file also. But it doesn't crawl the website I want to crawl. Any other helpful idea? – P.Postrique Jul 31 '17 at 06:56
0

I found the answer to my question here : How do I package a Scrapy script into a standalone application?

Thanks to the help provided by @juwi !! I think it's the easier way for me...

P.Postrique
  • 135
  • 1
  • 3
  • 12
0

you must make CrawlerProcess`s cod under if __name__=='__main__': to avoid run twice and error

Here is another possibility to run your spider as a standalone script or executable

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    class MySpider(scrapy.Spider):
        # Your spider definition
    if __name__=='__main__':
        process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
          })
    
        process.crawl(MySpider)
        process.start() # the script will block here until the crawling is finished

You can find more information here: https://doc.scrapy.org/en/1.0/topics/practices.html

Ahmed Ellban
  • 156
  • 6