4

I have a project deployed on Scrapinghub, I do not have any copy of that code at all.

How can I download the whole project's code on my localhost from Scrapinghub?

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146

2 Answers2

6

I was able to download project code using

shub fetch-eggs project_id_here

Where project_id_here can be grabbed from browser URL when project is opened.

The resultant file will be a *.egg just extract it like a ZIP file using WinRAR or any other tool you use.

Additional notes: - SHUB does not have user-friendly errors, once I was logged into shub using a different account and was trying to download project of a another different account, so please make sure you are logged into the same scrapinghub account in which the project exists you are trying to download.

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
-1

As far as I know, there's currently no public API for retrieving your project source code on Scrapy Cloud. (Correct me if wrong.)
But it's indeed possible to retrieve your project source code without additional privileges.

When a job is running, the project-related files locate in the /app path:

job-<some-job-id>:/app$ ls -la /app                                                                                                                                                                                                                                                                                              
total 48                                                                                                                                                                                                                                                                                                                      
drwxr-xr-x  5 root   root     4096 Jul 27 17:13 .                                                                                                                                                                                                                                                                             
drwxr-xr-x 82 root   root     4096 Jul 28 04:09 ..                                                                                                                                                                                                                                                                            
-rw-r--r--  1 root   root    26695 Jul 27 17:13 __main__.egg                                                                                                                                                                                                                                                                  
drwxr-xr-x  2 nobody nogroup  4096 May 23 07:34 addons_eggs                                                                                                                                                                                                                                                                   
drwxr-xr-x  2 nobody nogroup  4096 Jul 24 14:27 python                                                                                                                                                                                                                                                                        
-rw-r--r--  1 root   root       14 Jul 24 14:27 requirements.txt

Where the file __main__.egg contains all your project source code.

Thus you may:

  1. Pick a currently running job, visit its console at: https://app.scrapinghub.com/p/[project_id]/[spider_id]/[job_id]/console
  2. Send the .egg file somewhere you may retrieve later, e.g. curl http://IP-address-of-your-own-server:8888/retrieve-file --data-binary @/app/__main__.egg (assuming you have prepared the service for receiving the data).

Alternatively, I suppose you could always contact Scrapinghub support for help.

starrify
  • 14,307
  • 5
  • 33
  • 50