I am working on a project where urls are put into a Django model called UrlItems
. The models.py file containing UrlItems
is located in the home
app. I typed scrapy startproject scraper
in the same directory as the models.py file. Please see this image to better understand my Django project structure.
I understand how to create new UrlItems
from my scraper but what if my goal is to get and iterate over my Django project's existing UrlItems
inside my spider's def start_requests(self)
function?
What I have tried:
1) I followed the marked solution in this question to try and see if my created DjangoItem
already had the UrlItem
s loaded. I tried to use UrlItemDjangoItem.objects.all()
in my spider's start_requests
function and realized that I would not be able to retrieve my Django project's UrlItem
s this way.
2) In my spider I tried to import my UrlItem
s like this from ...models import UrlItem
and I received this error ValueError: attempted relative import beyond top-level package
.
Update
After some consideration I may end up having the Scrapy spider query my Django application's API to receive a list of the existing Django objects in JSON.