Scrapyd Spiders are going missing every 24 hours

Question

I have a Scraypd server in heroku. It works fine and the spider works and connects to dbs without any issue.

I have set it to run everyday by the scheduler in the Scrapydweb UI.

However everyday the spider seems to disappear and I would have to reload the scrayd-deploy from my local machine to the server again for it to be scheduled but it never runs anything past that single day. Even though I have set it to run everyday at a certain time.

Does anyone know what might be the problem?

I am not sure what kind of details people need to see to help me resolve this. Please do ask and I shall provide what I can.

I'm not familiar with Scrapy so not sure how it works, but is it controlled by files on the local filesystem? If it does, Heroku's ephemeral filesystem won't be a good fit. Any changes will be lost whenever your dyno restarts, which happens frequently. Your guess of "every 24 hours" makes this quite likely. — ChrisGPT was on strike, Aug 29 '22 at 12:51
Yes, the deployment is from the local file system. how could i leave the filesystem on the heroku app instead? Do you have any suggestions or alternative that I can use instead? — Suren Gunaseelan, Aug 30 '22 at 07:07
[The sample application says](https://github.com/my8100/scrapyd-cluster-on-heroku#conclusion) "**Heroku apps would be restarted (cycled) at least once per day** and any changes to the local filesystem will be deleted, so you need the external database to persist data." I'm not totally clear how the author intends for you to use an external database, but there is a Redis server in the example. Are you using Redis? — ChrisGPT was on strike, Aug 30 '22 at 18:17

Scrapyd Spiders are going missing every 24 hours

0 Answers0