4

I'm trying to use this scrapy addon (or what it is): scrapyjs.

However there are no install instructions and I'm new to Python. Is there something basic here that I'm missing? How would i integrate this with a scrapy project.

Note: i would prefer to use the Scrapy download handler not the middleware version as it seems like it will be quicker to run. (correct me if im wrong).

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Ole Henrik Skogstrøm
  • 6,353
  • 10
  • 57
  • 89

2 Answers2

1

Since scrapyjs is not a regular python package and is not registered on PyPI - first, you need to clone the repository and move scrapyjs package under the PYTHONPATH, or into your scrapy project directory (make it "importable").

There are two options to integrate it with Scrapy:

The latter is much easier and cleaner, but would seriously affect performance since each request would be handled in a blocking mode.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • This is basicly the documentation on github. Do i download the zip from github and put it in the same forlder as my scrapy project or? I'm completely new to Python. Can you please write a bit more basic introduction? – Ole Henrik Skogstrøm Jan 02 '15 at 09:26
  • @OleHenrikSkogstrøm sure, updated the answer. Check it out. Thanks. – alecxe Jan 02 '15 at 09:32
  • Thank you, however I'm stil not completely sure how to do this. How do i move the scrapyjs package under the sys.path? Can you give me an example? Sorry for being a complete noob at this :-/. – Ole Henrik Skogstrøm Jan 02 '15 at 09:34
  • @OleHenrikSkogstrøm try putting it near your `scrapy.cfg` file - this should do the trick. – alecxe Jan 02 '15 at 09:36
  • Ok, i think i got it working now. However i need to install dependencies. Can you maybe take a look at this [question](http://stackoverflow.com/questions/19919985/how-to-install-python-gtk2-python-webkit-and-python-jswebkit-on-osx) and see if you are able to give a more complete answer? How do i install jswebkit for example? – Ole Henrik Skogstrøm Jan 02 '15 at 09:52
  • @OleHenrikSkogstrøm oh yeah, I remember that "exciting adventure", please don't ask me to dive into it again :) – alecxe Jan 02 '15 at 10:13
1

To add to alecxe's answer, for Ubuntu/Debian systems, first install the dependencies (webkit, gtk2 and jswebkit)

sudo apt-get install python-jswebkit libwebkitgtk-1.0-0 python-webkit
sudo apt-get install python-gtk2 python-gnome2 python-glade2 python-gobject

If you are working with a virtualenv you'll have to symlink the libs you installed

mkdir your-venv/lib/python2.7/dist-packages
ln -s /usr/lib/python2.7/dist-packages/gtk-2.0* lib/python2.7/dist-packages/
ln -s /usr/lib/python2.7/dist-packages/pygtk.pth lib/python2.7/dist-packages/
ln -s /usr/lib/python2.7/dist-packages/gobject/ lib/python2.7/dist-packages/
ln -s /usr/lib/python2.7/dist-packages/glib/ lib/python2.7/dist-packages/
ln -s /usr/lib/python2.7/dist-packages/cairo lib/python2.7/dist-packages/
ln -s /usr/lib/python2.7/dist-packages/webkit lib/python2.7/dist-packages/
ln -s /usr/lib/python2.7/dist-packages/jswebkit.so lib/python2.7/dist-packages/

To use the patch method, find out where your scrapy lives (if you don't already)

python -c "import scrapy; print scrapy.__file__"

This will give you the location of the compiled bytecode of Scrapy's __init__.py. Go to that directory and add the following lines in __init__.py

from twisted.internet import gtk2reactor
gtk2reactor.install()
pad
  • 2,296
  • 2
  • 16
  • 23
  • Thank you again :). The problem is that I'm trying to get this to work on OSX and Windows aswell (first OSX). It seems like no one knows how to install python-webkit and python-jswebkit on OSX... Are you able to find this out? – Ole Henrik Skogstrøm Jan 03 '15 at 10:45