Here is what I have in the Scrapy spider:
phone_model = SmartphoneItem()
phone_model['sku_number'] = sku_number
# code omitted
cellular_network = CellularNetworkItem()
cellular_network['phone_model'] = phone_model
cellular_network['speed'] = speed
...
in models.py:
class CellularNetwork(models.Model):
phone_model = models.ForeignKey('Smartphone', unique=True)
...
class Smartphone(models.Model):
sku_number = models.IntegerField(max_length=40, primary_key=True)
....
and in items.py:
class CellularNetworkItem(DjangoItem):
django_model = CellularNetwork
class SmartphoneItem(DjangoItem):
django_model = Smartphone
But assigning phone_model = SmartphoneItem()
obviously does not yield a Smartphone model.
I am scraping a bunch of specifications and would prefer to validate the data at source. Since the data has to be normalized anyway for validation, I'd prefer to kill too birds with one stone and update the database immediately.
Seemingly the relational power of ORMs over simple Scrapy items is what sells the DjangoItem. But I can't seem find any examples of exploiting this feature directly in the spider. I've seem some that use pipelines, handling objects case-by-case with isinstance
pattern matching... I'm starting to wonder if I might be better off just importing Django models directly into the spider.
UPDATE: Resolved.
Based on How to update DjangoItem in Scrapy, I put this in the spider to directly return the Django model instance:
def item_to_model(self, item):
model_class = getattr(item, 'django_model')
if not model_class:
raise TypeError("Item is not a `DjangoItem` or is misconfigured")
return item.instance
UPDATE 2: Still broken:
Spoke too soon. While this provides a valid Django instance, it is not the instance. Here is the relevant part of djangoitem.py:
@property
def instance(self):
if self._instance is None:
modelargs = dict((k, self.get(k)) for k in self._values
if k in self._model_fields)
self._instance = self.django_model(**modelargs)
return self._instance
I don't fully understand what's going on, but I gather it gives us a fresh instance.